- 08 1月, 2021 5 次提交
-
-
由 Ido Schimmel 提交于
In case of error, remove the nexthop group entry from the list to which it was previously added. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NPetr Machata <petrm@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ido Schimmel 提交于
A reference was not taken for the current nexthop entry, so do not try to put it in the error path. Fixes: 430a0491 ("nexthop: Add support for nexthop groups") Signed-off-by: NIdo Schimmel <idosch@nvidia.com> Reviewed-by: NPetr Machata <petrm@nvidia.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
Conntrack reassembly records the largest fragment size seen in IPCB. However, when this gets forwarded/transmitted, fragmentation will only be forced if one of the fragmented packets had the DF bit set. In that case, a flag in IPCB will force fragmentation even if the MTU is large enough. This should work fine, but this breaks with ip tunnels. Consider client that sends a UDP datagram of size X to another host. The client fragments the datagram, so two packets, of size y and z, are sent. DF bit is not set on any of these packets. Middlebox netfilter reassembles those packets back to single size-X packet, before routing decision. packet-size-vs-mtu checks in ip_forward are irrelevant, because DF bit isn't set. At output time, ip refragmentation is skipped as well because x is still smaller than the mtu of the output device. If ttransmit device is an ip tunnel, the packet size increases to x+overhead. Also, tunnel might be configured to force DF bit on outer header. In this case, packet will be dropped (exceeds MTU) and an ICMP error is generated back to sender. But sender already respects the announced MTU, all the packets that it sent did fit the announced mtu. Force refragmentation as per original sizes unconditionally so ip tunnel will encapsulate the fragments instead. The only other solution I see is to place ip refragmentation in the ip_tunnel code to handle this case. Fixes: d6b915e2 ("ip_fragment: don't forward defragmented DF packet") Reported-by: NChristian Perle <christian.perle@secunet.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
For some reason ip_tunnel insist on setting the DF bit anyway when the inner header has the DF bit set, EVEN if the tunnel was configured with 'nopmtudisc'. This means that the script added in the previous commit cannot be made to work by adding the 'nopmtudisc' flag to the ip tunnel configuration. Doing so breaks connectivity even for the without-conntrack/netfilter scenario. When nopmtudisc is set, the tunnel will skip the mtu check, so no icmp error is sent to client. Then, because inner header has DF set, the outer header gets added with DF bit set as well. IP stack then sends an error to itself because the packet exceeds the device MTU. Fixes: 23a3647b ("ip_tunnels: Use skb-len to PMTU check.") Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NPablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Sean Tranchetti 提交于
Route removal is handled by two code paths. The main removal path is via fib6_del_route() which will handle purging any PMTU exceptions from the cache, removing all per-cpu copies of the DST entry used by the route, and releasing the fib6_info struct. The second removal location is during fib6_add_rt2node() during a route replacement operation. This path also calls fib6_purge_rt() to handle cleaning up the per-cpu copies of the DST entries and releasing the fib6_info associated with the older route, but it does not flush any PMTU exceptions that the older route had. Since the older route is removed from the tree during the replacement, we lose any way of accessing it again. As these lingering DSTs and the fib6_info struct are holding references to the underlying netdevice struct as well, unregistering that device from the kernel can never complete. Fixes: 2b760fcf ("ipv6: hook up exception table to store dst cache") Signed-off-by: NSean Tranchetti <stranche@codeaurora.org> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/1609892546-11389-1-git-send-email-stranche@quicinc.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 1月, 2021 3 次提交
-
-
由 Qinglang Miao 提交于
A null-ptr-deref bug is reported by Hulk Robot like this: -------------- KASAN: null-ptr-deref in range [0x0000000000000128-0x000000000000012f] Call Trace: qrtr_ns_remove+0x22/0x40 [ns] qrtr_proto_fini+0xa/0x31 [qrtr] __x64_sys_delete_module+0x337/0x4e0 do_syscall_64+0x34/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x468ded -------------- When qrtr_ns_init fails in qrtr_proto_init, qrtr_ns_remove which would be called later on would raise a null-ptr-deref because qrtr_ns.workqueue has been destroyed. Fix it by making qrtr_ns_init have a return value and adding a check in qrtr_proto_init. Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NQinglang Miao <miaoqinglang@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
VLAN checks for NETREG_UNINITIALIZED to distinguish between registration failure and unregistration in progress. Since commit cb626bf5 ("net-sysfs: Fix reference count leak") registration failure may, however, result in NETREG_UNREGISTERED as well as NETREG_UNINITIALIZED. This fix is similer to cebb6975 ("rtnetlink: Fix memory(net_device) leak when ->newlink fails") Fixes: cb626bf5 ("net-sysfs: Fix reference count leak") Signed-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
Without crc32 support, this fails to link: arm-linux-gnueabi-ld: net/wireless/scan.o: in function `cfg80211_scan_6ghz': scan.c:(.text+0x928): undefined reference to `crc32_le' Fixes: c8cb5b85 ("nl80211/cfg80211: support 6 GHz scanning") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 1月, 2021 1 次提交
-
-
由 Xie He 提交于
In lapb_device_event, lapb_devtostruct is called to get a reference to an object of "struct lapb_cb". lapb_devtostruct increases the refcount of the object and returns a pointer to it. However, we didn't decrease the refcount after we finished using the pointer. This patch fixes this problem. Fixes: a4989fa9 ("net/lapb: support netdev events") Cc: Martin Schiller <ms@dev.tdt.de> Signed-off-by: NXie He <xie.he.0141@gmail.com> Link: https://lore.kernel.org/r/20201231174331.64539-1-xie.he.0141@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 29 12月, 2020 11 次提交
-
-
由 Cong Wang 提交于
Both version 0 and version 1 use ETH_P_ERSPAN, but version 0 does not have an erspan header. So the check in gre_parse_header() is wrong, we have to distinguish version 1 from version 0. We can just check the gre header length like is_erspan_type1(). Fixes: cb73ee40 ("net: ip_gre: use erspan key field for tunnel lookup") Reported-by: syzbot+f583ce3d4ddf9836b27a@syzkaller.appspotmail.com Cc: William Tu <u9012063@gmail.com> Cc: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Randy Dunlap 提交于
Check Scell_log shift size in red_check_params() and modify all callers of red_check_params() to pass Scell_log. This prevents a shift out-of-bounds as detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/net/red.h:252:22 shift exponent 72 is too large for 32-bit type 'int' Fixes: 8afa10cb ("net_sched: red: Avoid illegal values") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Reported-by: syzbot+97c5bd9cc81eca63d36e@syzkaller.appspotmail.com Cc: Nogah Frankel <nogahf@mellanox.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: netdev@vger.kernel.org Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 weichenchen 提交于
pneigh_enqueue() tries to obtain a random delay by mod NEIGH_VAR(p, PROXY_DELAY). However, NEIGH_VAR(p, PROXY_DELAY) migth be zero at that point because someone could write zero to /proc/sys/net/ipv4/neigh/[device]/proxy_delay after the callers check it. This patch uses prandom_u32_max() to get a random delay instead which avoids potential division by zero. Signed-off-by: Nweichenchen <weichen.chen@linux.alibaba.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
RT_TOS() only clears one of the ECN bits. Therefore, when fib_compute_spec_dst() resorts to a fib lookup, it can return different results depending on the value of the second ECN bit. For example, ECT(0) and ECT(1) packets could be treated differently. $ ip netns add ns0 $ ip netns add ns1 $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev lo up $ ip -netns ns1 link set dev lo up $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10/24 dev veth01 $ ip -netns ns1 address add 192.0.2.11/24 dev veth10 $ ip -netns ns1 address add 192.0.2.21/32 dev lo $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 src 192.0.2.21 $ ip netns exec ns1 sysctl -wq net.ipv4.icmp_echo_ignore_broadcasts=0 With TOS 4 and ECT(1), ns1 replies using source address 192.0.2.21 (ping uses -Q to set all TOS and ECN bits): $ ip netns exec ns0 ping -c 1 -b -Q 5 192.0.2.255 [...] 64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.544 ms But with TOS 4 and ECT(0), ns1 replies using source address 192.0.2.11 because the "tos 4" route isn't matched: $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255 [...] 64 bytes from 192.0.2.11: icmp_seq=1 ttl=64 time=0.597 ms After this patch the ECN bits don't affect the result anymore: $ ip netns exec ns0 ping -c 1 -b -Q 6 192.0.2.255 [...] 64 bytes from 192.0.2.21: icmp_seq=1 ttl=64 time=0.591 ms Fixes: 35ebf65e ("ipv4: Create and use fib_compute_spec_dst() helper.") Signed-off-by: NGuillaume Nault <gnault@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Davide Caratti 提交于
the following syzkaller reproducer: r0 = socket$inet_mptcp(0x2, 0x1, 0x106) bind$inet(r0, &(0x7f0000000080)={0x2, 0x4e24, @multicast2}, 0x10) connect$inet(r0, &(0x7f0000000480)={0x2, 0x4e24, @local}, 0x10) sendto$inet(r0, &(0x7f0000000100)="f6", 0xffffffe7, 0xc000, 0x0, 0x0) systematically triggers the following warning: WARNING: CPU: 2 PID: 8618 at net/core/stream.c:208 sk_stream_kill_queues+0x3fa/0x580 Modules linked in: CPU: 2 PID: 8618 Comm: syz-executor Not tainted 5.10.0+ #334 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/04 RIP: 0010:sk_stream_kill_queues+0x3fa/0x580 Code: df 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 40 8b ab 20 02 00 00 e9 64 ff ff ff e8 df f0 81 2 RSP: 0018:ffffc9000290fcb0 EFLAGS: 00010293 RAX: ffff888011cb8000 RBX: 0000000000000000 RCX: ffffffff86eecf0e RDX: 0000000000000000 RSI: ffffffff86eecf6a RDI: 0000000000000005 RBP: 0000000000000e28 R08: ffff888011cb8000 R09: fffffbfff1f48139 R10: ffffffff8fa409c7 R11: fffffbfff1f48138 R12: ffff8880215e6220 R13: ffffffff8fa409c0 R14: ffffc9000290fd30 R15: 1ffff92000521fa2 FS: 00007f41c78f4800(0000) GS:ffff88802d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f95c803d088 CR3: 0000000025ed2000 CR4: 00000000000006f0 Call Trace: __mptcp_destroy_sock+0x4f5/0x8e0 mptcp_close+0x5e2/0x7f0 inet_release+0x12b/0x270 __sock_release+0xc8/0x270 sock_close+0x18/0x20 __fput+0x272/0x8e0 task_work_run+0xe0/0x1a0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x19/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 userspace programs provide arbitrarily high values of 'len' in sendmsg(): this is causing integer overflow of 'amount'. Cap forward allocation to 1 megabyte: higher values are not really useful. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Fixes: e93da928 ("mptcp: implement wmem reservation") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/3334d00d8b2faecafdfab9aa593efcbf61442756.1608584474.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Antoine Tenart 提交于
Accesses to dev->xps_rxqs_map (when using dev->num_tc) should be protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't see an actual bug being triggered, but let's be safe here and take the rtnl lock while accessing the map in sysfs. Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue") Signed-off-by: NAntoine Tenart <atenart@kernel.org> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Antoine Tenart 提交于
Two race conditions can be triggered when storing xps rxqs, resulting in various oops and invalid memory accesses: 1. Calling netdev_set_num_tc while netif_set_xps_queue: - netif_set_xps_queue uses dev->tc_num as one of the parameters to compute the size of new_dev_maps when allocating it. dev->tc_num is also used to access the map, and the compiler may generate code to retrieve this field multiple times in the function. - netdev_set_num_tc sets dev->tc_num. If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is set to a higher value through netdev_set_num_tc, later accesses to new_dev_maps in netif_set_xps_queue could lead to accessing memory outside of new_dev_maps; triggering an oops. 2. Calling netif_set_xps_queue while netdev_set_num_tc is running: 2.1. netdev_set_num_tc starts by resetting the xps queues, dev->tc_num isn't updated yet. 2.2. netif_set_xps_queue is called, setting up the map with the *old* dev->num_tc. 2.3. netdev_set_num_tc updates dev->tc_num. 2.4. Later accesses to the map lead to out of bound accesses and oops. A similar issue can be found with netdev_reset_tc. One way of triggering this is to set an iface up (for which the driver uses netdev_set_num_tc in the open path, such as bnx2x) and writing to xps_rxqs in a concurrent thread. With the right timing an oops is triggered. Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc and netdev_reset_tc should be mutually exclusive. We do that by taking the rtnl lock in xps_rxqs_store. Fixes: 8af2c06f ("net-sysfs: Add interface for Rx queue(s) map per Tx queue") Signed-off-by: NAntoine Tenart <atenart@kernel.org> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Antoine Tenart 提交于
Accesses to dev->xps_cpus_map (when using dev->num_tc) should be protected by the rtnl lock, like we do for netif_set_xps_queue. I didn't see an actual bug being triggered, but let's be safe here and take the rtnl lock while accessing the map in sysfs. Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: NAntoine Tenart <atenart@kernel.org> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Antoine Tenart 提交于
Two race conditions can be triggered when storing xps cpus, resulting in various oops and invalid memory accesses: 1. Calling netdev_set_num_tc while netif_set_xps_queue: - netif_set_xps_queue uses dev->tc_num as one of the parameters to compute the size of new_dev_maps when allocating it. dev->tc_num is also used to access the map, and the compiler may generate code to retrieve this field multiple times in the function. - netdev_set_num_tc sets dev->tc_num. If new_dev_maps is allocated using dev->tc_num and then dev->tc_num is set to a higher value through netdev_set_num_tc, later accesses to new_dev_maps in netif_set_xps_queue could lead to accessing memory outside of new_dev_maps; triggering an oops. 2. Calling netif_set_xps_queue while netdev_set_num_tc is running: 2.1. netdev_set_num_tc starts by resetting the xps queues, dev->tc_num isn't updated yet. 2.2. netif_set_xps_queue is called, setting up the map with the *old* dev->num_tc. 2.3. netdev_set_num_tc updates dev->tc_num. 2.4. Later accesses to the map lead to out of bound accesses and oops. A similar issue can be found with netdev_reset_tc. One way of triggering this is to set an iface up (for which the driver uses netdev_set_num_tc in the open path, such as bnx2x) and writing to xps_cpus in a concurrent thread. With the right timing an oops is triggered. Both issues have the same fix: netif_set_xps_queue, netdev_set_num_tc and netdev_reset_tc should be mutually exclusive. We do that by taking the rtnl lock in xps_cpus_store. Fixes: 184c449f ("net: Add support for XPS with QoS via traffic classes") Signed-off-by: NAntoine Tenart <atenart@kernel.org> Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Ilya Dryomov 提交于
crypto_shash_setkey() and crypto_aead_setkey() will do a (small) GFP_ATOMIC allocation to align the key if it isn't suitably aligned. It's not a big deal, but at the same time easy to avoid. The actual alignment requirement is dynamic, queryable with crypto_shash_alignmask() and crypto_aead_alignmask(), but shouldn't be stricter than 16 bytes for our algorithms. Fixes: cd1a677c ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
由 Ilya Dryomov 提交于
auth_signature frame is 68 bytes in plain mode and 96 bytes in secure mode but we are requesting 68 bytes in both modes. By luck, this doesn't actually result in any invalid memory accesses because the allocation is satisfied out of kmalloc-96 slab and so exactly 96 bytes are allocated, but KASAN rightfully complains. Fixes: cd1a677c ("libceph, ceph: implement msgr2.1 protocol (crc and secure modes)") Reported-by: NLuis Henriques <lhenriques@suse.de> Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
-
- 28 12月, 2020 2 次提交
-
-
由 Pablo Neira Ayuso 提交于
The set flag NFT_SET_EXPR provides a hint to the kernel that userspace supports for multiple expressions per set element. In the same direction, NFT_DYNSET_F_EXPR specifies that dynset expression defines multiple expressions per set element. This allows new userspace software with old kernels to bail out with EOPNOTSUPP. This update is similar to ef516e86 ("netfilter: nf_tables: reintroduce the NFT_SET_CONCAT flag"). The NFT_SET_EXPR flag needs to be set on when the NFTA_SET_EXPRESSIONS attribute is specified. The NFT_SET_EXPR flag is not set on with NFTA_SET_EXPR to retain backward compatibility in old userspace binaries. Fixes: 48b0ae04 ("netfilter: nftables: netlink support for several set element expressions") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
If userspace requests a feature which is not available the original set definition, then bail out with EOPNOTSUPP. If userspace sends unsupported dynset flags (new feature not supported by this kernel), then report EOPNOTSUPP to userspace. EINVAL should be only used to report malformed netlink messages from userspace. Fixes: 22fe54d5 ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 27 12月, 2020 1 次提交
-
-
由 Florian Westphal 提交于
syzbot reports: detected buffer overflow in strlen [..] Call Trace: strlen include/linux/string.h:325 [inline] strlcpy include/linux/string.h:348 [inline] xt_rateest_tg_checkentry+0x2a5/0x6b0 net/netfilter/xt_RATEEST.c:143 strlcpy assumes src is a c-string. Check info->name before its used. Reported-by: syzbot+e86f7c428c8c50db65b4@syzkaller.appspotmail.com Fixes: 5859034d ("[NETFILTER]: x_tables: add RATEEST target") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 24 12月, 2020 2 次提交
-
-
由 John Wang 提交于
When aggregating ncsi interfaces and dedicated interfaces to bond interfaces, the ncsi response handler will use the wrong net device to find ncsi_dev, so that the ncsi interface will not work properly. Here, we use the original net device to fix it. Fixes: 138635cc ("net/ncsi: NCSI response packet handler") Signed-off-by: NJohn Wang <wangzhiqiang.bj@bytedance.com> Link: https://lore.kernel.org/r/20201223055523.2069-1-wangzhiqiang.bj@bytedance.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Petr Machata 提交于
DCB uses the same handler function for both RTM_GETDCB and RTM_SETDCB messages. dcb_doit() bounces RTM_SETDCB mesasges if the user does not have the CAP_NET_ADMIN capability. However, the operation to be performed is not decided from the DCB message type, but from the DCB command. Thus DCB_CMD_*_GET commands are used for reading DCB objects, the corresponding SET and DEL commands are used for manipulation. The assumption is that set-like commands will be sent via an RTM_SETDCB message, and get-like ones via RTM_GETDCB. However, this assumption is not enforced. It is therefore possible to manipulate DCB objects without CAP_NET_ADMIN capability by sending the corresponding command in an RTM_GETDCB message. That is a bug. Fix it by validating the type of the request message against the type used for the response. Fixes: 2f90b865 ("ixgbe: this patch adds support for DCB to the kernel and ixgbe driver") Signed-off-by: NPetr Machata <me@pmachata.org> Link: https://lore.kernel.org/r/a2a9b88418f3a58ef211b718f2970128ef9e3793.1608673640.git.me@pmachata.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 19 12月, 2020 2 次提交
-
-
由 Davide Caratti 提交于
taprio_graft() can insert a NULL element in the array of child qdiscs. As a consquence, taprio_reset() might not reset child qdiscs completely, and taprio_destroy() might leak resources. Fix it by ensuring that loops that iterate over q->qdiscs[] don't end when they find the first NULL item. Fixes: 44d4775c ("net/sched: sch_taprio: reset child qdiscs before freeing them") Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler") Suggested-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/13edef6778fef03adc751582562fba4a13e06d6a.1608240532.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Baruch Siach 提交于
On 64-bit systems the packet procfs header field names following 'sk' are not aligned correctly: sk RefCnt Type Proto Iface R Rmem User Inode 00000000605d2c64 3 3 0003 7 1 450880 0 16643 00000000080e9b80 2 2 0000 0 0 0 0 17404 00000000b23b8a00 2 2 0000 0 0 0 0 17421 ... With this change field names are correctly aligned: sk RefCnt Type Proto Iface R Rmem User Inode 000000005c3b1d97 3 3 0003 7 1 21568 0 16178 000000007be55bb7 3 3 fbce 8 1 0 0 16250 00000000be62127d 3 3 fbcd 8 1 0 0 16254 ... Signed-off-by: NBaruch Siach <baruch@tkos.co.il> Link: https://lore.kernel.org/r/54917251d8433735d9a24e935a6cb8eb88b4058a.1608103684.git.baruch@tkos.co.ilSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 18 12月, 2020 11 次提交
-
-
由 Magnus Karlsson 提交于
Rollback the reservation in the completion ring when we get a NETDEV_TX_BUSY. When this error is received from the driver, we are supposed to let the user application retry the transmit again. And in order to do this, we need to roll back the failed send so it can be retried. Unfortunately, we did not cancel the reservation we had made in the completion ring. By not doing this, we actually make the completion ring one entry smaller per NETDEV_TX_BUSY error we get, and after enough of these errors the completion ring will be of size zero and transmit will stop working. Fix this by cancelling the reservation when we get a NETDEV_TX_BUSY error. Fixes: 642e450b ("xsk: Do not discard packet when NETDEV_TX_BUSY") Reported-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201218134525.13119-3-magnus.karlsson@gmail.com
-
由 Magnus Karlsson 提交于
Fix a race when multiple sockets are simultaneously calling sendto() when the completion ring is shared in the SKB case. This is the case when you share the same netdev and queue id through the XDP_SHARED_UMEM bind flag. The problem is that multiple processes can be in xsk_generic_xmit() and call the backpressure mechanism in xskq_prod_reserve(xs->pool->cq). As this is a shared resource in this specific scenario, a race might occur since the rings are single-producer single-consumer. Fix this by moving the tx_completion_lock from the socket to the pool as the pool is shared between the sockets that share the completion ring. (The pool is not shared when this is not the case.) And then protect the accesses to xskq_prod_reserve() with this lock. The tx_completion_lock is renamed cq_lock to better reflect that it protects accesses to the potentially shared completion ring. Fixes: 35fcde7f ("xsk: support for Tx") Reported-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201218134525.13119-2-magnus.karlsson@gmail.com
-
由 Magnus Karlsson 提交于
Fix a possible memory leak when a bind of an AF_XDP socket fails. When the fill and completion rings are created, they are tied to the socket. But when the buffer pool is later created at bind time, the ownership of these two rings are transferred to the buffer pool as they might be shared between sockets (and the buffer pool cannot be created until we know what we are binding to). So, before the buffer pool is created, these two rings are cleaned up with the socket, and after they have been transferred they are cleaned up together with the buffer pool. The problem is that ownership was transferred before it was absolutely certain that the buffer pool could be created and initialized correctly and when one of these errors occurred, the fill and completion rings did neither belong to the socket nor the pool and where therefore leaked. Solve this by moving the ownership transfer to the point where the buffer pool has been completely set up and there is no way it can fail. Fixes: 7361f9c3 ("xsk: Move fill and completion rings to buffer pool") Reported-by: syzbot+cfa88ddd0655afa88763@syzkaller.appspotmail.com Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NBjörn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20201214085127.3960-1-magnus.karlsson@gmail.com
-
由 Davide Caratti 提交于
syzkaller shows that packets can still be dequeued while taprio_destroy() is running. Let sch_taprio use the reset() function to cancel the advance timer and drop all skbs from the child qdiscs. Fixes: 5a781ccb ("tc: Add support for configuring the taprio scheduler") Link: https://syzkaller.appspot.com/bug?id=f362872379bf8f0017fb667c1ab158f2d1e764ae Reported-by: syzbot+8971da381fb5a31f542d@syzkaller.appspotmail.com Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Acked-by: NVinicius Costa Gomes <vinicius.gomes@intel.com> Link: https://lore.kernel.org/r/63b6d79b0e830ebb0283e020db4df3cdfdfb2b94.1608142843.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Vasily Averin 提交于
htable_bits() can call jhash_size(32) and trigger shift-out-of-bounds UBSAN: shift-out-of-bounds in net/netfilter/ipset/ip_set_hash_gen.h:151:6 shift exponent 32 is too large for 32-bit type 'unsigned int' CPU: 0 PID: 8498 Comm: syz-executor519 Not tainted 5.10.0-rc7-next-20201208-syzkaller #0 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 htable_bits net/netfilter/ipset/ip_set_hash_gen.h:151 [inline] hash_mac_create.cold+0x58/0x9b net/netfilter/ipset/ip_set_hash_gen.h:1524 ip_set_create+0x610/0x1380 net/netfilter/ipset/ip_set_core.c:1115 nfnetlink_rcv_msg+0xecc/0x1180 net/netfilter/nfnetlink.c:252 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2494 nfnetlink_rcv+0x1ac/0x420 net/netfilter/nfnetlink.c:600 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0x907/0xe40 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:672 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2345 ___sys_sendmsg+0xf3/0x170 net/socket.c:2399 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch replaces htable_bits() by simple fls(hashsize - 1) call: it alone returns valid nbits both for round and non-round hashsizes. It is normal to set any nbits here because it is validated inside following htable_size() call which returns 0 for nbits>31. Fixes: 1feab10d("netfilter: ipset: Unified hash type generation") Reported-by: syzbot+d66bfadebca46cf61a2b@syzkaller.appspotmail.com Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Acked-by: NJozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Vasily Averin 提交于
currently mtype_resize() can cause oops t = ip_set_alloc(htable_size(htable_bits)); if (!t) { ret = -ENOMEM; goto out; } t->hregion = ip_set_alloc(ahash_sizeof_regions(htable_bits)); Increased htable_bits can force htable_size() to return 0. In own turn ip_set_alloc(0) returns not 0 but ZERO_SIZE_PTR, so follwoing access to t->hregion should trigger an OOPS. Signed-off-by: NVasily Averin <vvs@virtuozzo.com> Acked-by: NJozsef Kadlecsik <kadlec@netfilter.org> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
This fixes the dereference to fetch the RCU pointer when holding the appropriate xtables lock. Reported-by: Nkernel test robot <lkp@intel.com> Fixes: cc00bcaa ("netfilter: x_tables: Switch synchronization to RCU") Signed-off-by: NSubash Abhinov Kasiviswanathan <subashab@codeaurora.org> Reviewed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Paolo Abeni 提交于
When sendmsg() needs to wait for memory, the pending data is not updated. That causes a drift in forward memory allocation, leading to stall and/or warnings at socket close time. This change addresses the above issue moving the pending data counter update inside the sendmsg() main loop. Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
When multiple subflows are active, we can receive a window update on subflow with no write space available. MPTCP will try to push frames on such subflow and will fail. Pending frames will be pushed only after receiving a window update on a subflow with some wspace available. Overall the above could lead to suboptimal aggregate bandwidth usage. Instead, we should try to push pending frames as soon as the subflow reaches both conditions mentioned above. We can finally enable self-tests with asymmetric links, as the above makes them finally pass. Fixes: 6f8a612a ("mptcp: keep track of advertised windows right edge") Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
MPTCP closes the subflows while holding the msk-level lock. While acquiring the subflow socket lock we need to use the correct nested annotation, or we can hit a lockdep splat at runtime. Reported-and-tested-by: NGeliang Tang <geliangtang@gmail.com> Fixes: e16163b6 ("mptcp: refactor shutdown and close") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Currently MPTCP is not propagating the security context from the ingress request socket to newly created msk at clone time. Address the issue invoking the missing security helper. Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 12月, 2020 2 次提交
-
-
由 Geliang Tang 提交于
This patch cleared use_ack and use_map when dropping other suboptions to fix the following syzkaller BUG: [ 15.223006] BUG: unable to handle page fault for address: 0000000000223b10 [ 15.223700] #PF: supervisor read access in kernel mode [ 15.224209] #PF: error_code(0x0000) - not-present page [ 15.224724] PGD b8d5067 P4D b8d5067 PUD c0a5067 PMD 0 [ 15.225237] Oops: 0000 [#1] SMP [ 15.225556] CPU: 0 PID: 7747 Comm: syz-executor Not tainted 5.10.0-rc6+ #24 [ 15.226281] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 15.227292] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.227816] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.229669] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293 [ 15.230188] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006 [ 15.230895] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700 [ 15.231593] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001 [ 15.232299] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0 [ 15.233007] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002 [ 15.233714] FS: 00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000 [ 15.234509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.235081] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0 [ 15.235788] Call Trace: [ 15.236042] skb_release_all+0x28/0x30 [ 15.236419] __kfree_skb+0x11/0x20 [ 15.236768] tcp_data_queue+0x270/0x1240 [ 15.237161] ? tcp_urg+0x50/0x2a0 [ 15.237496] tcp_rcv_established+0x39a/0x890 [ 15.237997] ? mark_held_locks+0x49/0x70 [ 15.238467] tcp_v4_do_rcv+0xb9/0x270 [ 15.238915] __release_sock+0x8a/0x160 [ 15.239365] release_sock+0x32/0xd0 [ 15.239793] __inet_stream_connect+0x1d2/0x400 [ 15.240313] ? do_wait_intr_irq+0x80/0x80 [ 15.240791] inet_stream_connect+0x36/0x50 [ 15.241275] mptcp_stream_connect+0x69/0x1b0 [ 15.241787] __sys_connect+0x122/0x140 [ 15.242236] ? syscall_enter_from_user_mode+0x17/0x50 [ 15.242836] ? lockdep_hardirqs_on_prepare+0xd4/0x170 [ 15.243436] __x64_sys_connect+0x1a/0x20 [ 15.243924] do_syscall_64+0x33/0x40 [ 15.244313] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 15.244821] RIP: 0033:0x7f65d946e469 [ 15.245183] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 [ 15.247019] RSP: 002b:00007f65d9b5eda8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 15.247770] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007f65d946e469 [ 15.248471] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005 [ 15.249205] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000 [ 15.249908] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c [ 15.250603] R13: 00007fffe8a25cef R14: 00007f65d9b3f000 R15: 0000000000000003 [ 15.251312] Modules linked in: [ 15.251626] CR2: 0000000000223b10 [ 15.251965] BUG: kernel NULL pointer dereference, address: 0000000000000048 [ 15.252005] ---[ end trace f5c51fe19123c773 ]--- [ 15.252822] #PF: supervisor read access in kernel mode [ 15.252823] #PF: error_code(0x0000) - not-present page [ 15.252825] PGD c6c6067 P4D c6c6067 PUD c0d8067 [ 15.253294] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.253910] PMD 0 [ 15.253914] Oops: 0000 [#2] SMP [ 15.253917] CPU: 1 PID: 7746 Comm: syz-executor Tainted: G D 5.10.0-rc6+ #24 [ 15.253920] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 15.254435] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.254899] RIP: 0010:skb_release_data+0x89/0x1e0 [ 15.254902] Code: 5b 5d 41 5c 41 5d 41 5e 41 5f e9 02 06 8a ff e8 fd 05 8a ff 45 31 ed 80 7d 02 00 4c 8d 65 30 74 55 e8 eb 05 8a ff 49 8b 1c 24 <4c> 8b 7b 08 41 f6 c7 01 0f 85 18 01 00 00 e8 d4 05 8a ff 8b 43 34 [ 15.254905] RSP: 0018:ffffc900019bfc08 EFLAGS: 00010293 [ 15.255376] RSP: 0018:ffffc900019c7c08 EFLAGS: 00010293 [ 15.255580] [ 15.255583] RAX: ffff888004a7ac80 RBX: 0000000000000040 RCX: 0000000000000000 [ 15.255912] [ 15.256724] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6ddd00 [ 15.257620] RAX: ffff88800daad900 RBX: 0000000000223b08 RCX: 0000000000000006 [ 15.259817] RBP: ffff88800e9006c0 R08: 0000000000000000 R09: 0000000000000000 [ 15.259818] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88800e9006f0 [ 15.259820] R13: 0000000000000000 R14: ffff88807f6ddd00 R15: 0000000000000002 [ 15.259822] FS: 00007fae4a60a700(0000) GS:ffff88807c500000(0000) knlGS:0000000000000000 [ 15.259826] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.260296] RDX: 0000000000000000 RSI: ffffffff818e06c5 RDI: ffff88807f6dc700 [ 15.262514] CR2: 0000000000000048 CR3: 000000000b89c000 CR4: 00000000000006e0 [ 15.262515] Call Trace: [ 15.262519] skb_release_all+0x28/0x30 [ 15.262523] __kfree_skb+0x11/0x20 [ 15.263054] RBP: ffff88807f71a4c0 R08: 0000000000000001 R09: 0000000000000001 [ 15.263680] tcp_data_queue+0x270/0x1240 [ 15.263843] R10: ffffc900019c7c18 R11: 0000000000000000 R12: ffff88807f71a4f0 [ 15.264693] ? tcp_urg+0x50/0x2a0 [ 15.264856] R13: 0000000000000000 R14: ffff88807f6dc700 R15: 0000000000000002 [ 15.265720] tcp_rcv_established+0x39a/0x890 [ 15.266438] FS: 00007f65d9b5f700(0000) GS:ffff88807c400000(0000) knlGS:0000000000000000 [ 15.267283] ? __schedule+0x3fa/0x880 [ 15.267287] tcp_v4_do_rcv+0xb9/0x270 [ 15.267290] __release_sock+0x8a/0x160 [ 15.268049] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 15.268788] release_sock+0x32/0xd0 [ 15.268791] __inet_stream_connect+0x1d2/0x400 [ 15.268795] ? do_wait_intr_irq+0x80/0x80 [ 15.269593] CR2: 0000000000223b10 CR3: 000000000b883000 CR4: 00000000000006f0 [ 15.270246] inet_stream_connect+0x36/0x50 [ 15.270250] mptcp_stream_connect+0x69/0x1b0 [ 15.270253] __sys_connect+0x122/0x140 [ 15.271097] Kernel panic - not syncing: Fatal exception [ 15.271820] ? syscall_enter_from_user_mode+0x17/0x50 [ 15.283542] ? lockdep_hardirqs_on_prepare+0xd4/0x170 [ 15.284275] __x64_sys_connect+0x1a/0x20 [ 15.284853] do_syscall_64+0x33/0x40 [ 15.285369] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 15.286105] RIP: 0033:0x7fae49f19469 [ 15.286638] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 [ 15.289295] RSP: 002b:00007fae4a609da8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 15.290375] RAX: ffffffffffffffda RBX: 000000000049bf00 RCX: 00007fae49f19469 [ 15.291403] RDX: 0000000000000010 RSI: 00000000200000c0 RDI: 0000000000000005 [ 15.292437] RBP: 000000000049bf00 R08: 0000000000000000 R09: 0000000000000000 [ 15.293456] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049bf0c [ 15.294473] R13: 00007fff0004b6bf R14: 00007fae4a5ea000 R15: 0000000000000003 [ 15.295492] Modules linked in: [ 15.295944] CR2: 0000000000000048 [ 15.296567] Kernel Offset: disabled [ 15.296941] ---[ end Kernel panic - not syncing: Fatal exception ]--- Reported-by: NChristoph Paasch <cpaasch@apple.com> Fixes: 84dfe367 (mptcp: send out dedicated ADD_ADDR packet) Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/ccca4e8f01457a1b495c5d612ed16c5f7a585706.1608010058.git.geliangtang@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Karsten Graul 提交于
The parent of an ib device is used to retrieve the PCI device attributes. It turns out that there are possible cases when an ib device has no parent set in the device structure, which may lead to page faults when trying to access this memory. Fix that by checking the parent pointer and consolidate the pci device specific processing in a new function. Fixes: a3db10ef ("net/smc: Add support for obtaining SMCR device list") Reported-by: syzbot+600fef7c414ee7e2d71b@syzkaller.appspotmail.com Signed-off-by: NKarsten Graul <kgraul@linux.ibm.com> Link: https://lore.kernel.org/r/20201215091058.49354-2-kgraul@linux.ibm.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-