- 24 10月, 2022 1 次提交
-
-
由 Kuniyuki Iwashima 提交于
After commit d38afeec ("tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in sk->sk_destruct() by setting inet6_sock_destruct() to it to make sure we do not leak inet6-specific resources. Now we can remove unnecessary inet6_destroy_sock() calls in sk->sk_prot->destroy(). DCCP and SCTP have their own sk->sk_destruct() function, so we change them separately in the following patches. Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2022 3 次提交
-
-
由 Paolo Abeni 提交于
The MPTCP data path is quite complex and hard to understend even without some foggy comments referring to modified code and/or completely misleading from the beginning. Update a few of them to more accurately describing the current status. Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Daire reported a user-space application hang-up when the peer is forcibly closed before the data transfer completion. The relevant application expects the peer to either do an application-level clean shutdown or a transport-level connection reset. We can accommodate a such user by extending the fastclose usage: at fd close time, if the msk socket has some unread data, and at FIN_WAIT timeout. Note that at MPTCP close time we must ensure that the TCP subflows will reset: set the linger socket option to a suitable value. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
When an mptcp socket is closed due to an incoming FASTCLOSE option, so specific sk_err is set and later syscall will fail usually with EPIPE. Align the current fastclose error handling with TCP reset, properly setting the socket error according to the current msk state and propagating such error. Additionally sendmsg() is currently not handling properly the sk_err, always returning EPIPE. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 9月, 2022 4 次提交
-
-
由 Menglong Dong 提交于
The mptcp socket and its subflow sockets in accept queue can't be released after the process exit. While the release of a mptcp socket in listening state, the corresponding tcp socket will be released too. Meanwhile, the tcp socket in the unaccept queue will be released too. However, only init subflow is in the unaccept queue, and the joined subflow is not in the unaccept queue, which makes the joined subflow won't be released, and therefore the corresponding unaccepted mptcp socket will not be released to. This can be reproduced easily with following steps: 1. create 2 namespace and veth: $ ip netns add mptcp-client $ ip netns add mptcp-server $ sysctl -w net.ipv4.conf.all.rp_filter=0 $ ip netns exec mptcp-client sysctl -w net.mptcp.enabled=1 $ ip netns exec mptcp-server sysctl -w net.mptcp.enabled=1 $ ip link add red-client netns mptcp-client type veth peer red-server \ netns mptcp-server $ ip -n mptcp-server address add 10.0.0.1/24 dev red-server $ ip -n mptcp-server address add 192.168.0.1/24 dev red-server $ ip -n mptcp-client address add 10.0.0.2/24 dev red-client $ ip -n mptcp-client address add 192.168.0.2/24 dev red-client $ ip -n mptcp-server link set red-server up $ ip -n mptcp-client link set red-client up 2. configure the endpoint and limit for client and server: $ ip -n mptcp-server mptcp endpoint flush $ ip -n mptcp-server mptcp limits set subflow 2 add_addr_accepted 2 $ ip -n mptcp-client mptcp endpoint flush $ ip -n mptcp-client mptcp limits set subflow 2 add_addr_accepted 2 $ ip -n mptcp-client mptcp endpoint add 192.168.0.2 dev red-client id \ 1 subflow 3. listen and accept on a port, such as 9999. The nc command we used here is modified, which makes it use mptcp protocol by default. $ ip netns exec mptcp-server nc -l -k -p 9999 4. open another *two* terminal and use each of them to connect to the server with the following command: $ ip netns exec mptcp-client nc 10.0.0.1 9999 Input something after connect to trigger the connection of the second subflow. So that there are two established mptcp connections, with the second one still unaccepted. 5. exit all the nc command, and check the tcp socket in server namespace. And you will find that there is one tcp socket in CLOSE_WAIT state and can't release forever. Fix this by closing all of the unaccepted mptcp socket in mptcp_subflow_queue_clean() with __mptcp_close(). Now, we can ensure that all unaccepted mptcp sockets will be cleaned by __mptcp_close() before they are released, so mptcp_sock_destruct(), which is used to clean the unaccepted mptcp socket, is not needed anymore. The selftests for mptcp is ran for this commit, and no new failures. Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests") Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets") Cc: stable@vger.kernel.org Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NMengen Sun <mengensun@tencent.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMenglong Dong <imagedong@tencent.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org> -
由 Menglong Dong 提交于
Factor out __mptcp_close() from mptcp_close(). The caller of __mptcp_close() should hold the socket lock, and cancel mptcp work when __mptcp_close() returns true. This function will be used in the next commit. Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests") Fixes: 6aeed904 ("mptcp: fix race on unaccepted mptcp sockets") Cc: stable@vger.kernel.org Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NMengen Sun <mengensun@tencent.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMenglong Dong <imagedong@tencent.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Benjamin Hesmans 提交于
If fastopen is used, poll must allow a first write that will trigger the SYN+data Similar to what is done in tcp_poll(). Acked-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NBenjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Dmytro Shytyi 提交于
When TCP_FASTOPEN_CONNECT has been set on the socket before a connect, the defer flag is set and must be handled when sendmsg is called. This is similar to what is done in tcp_sendmsg_locked(). Acked-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Co-developed-by: NBenjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: NBenjamin Hesmans <benjamin.hesmans@tessares.net> Signed-off-by: NDmytro Shytyi <dmytro@shytyi.net> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 9月, 2022 2 次提交
-
-
由 Geliang Tang 提交于
This patch adds a new bool variable 'do_check_data_fin' to replace the original int variable 'copied' in __mptcp_push_pending(), check it to determine whether to call __mptcp_check_send_data_fin(). Suggested-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
由 Matthieu Baerts 提交于
Similar to mptcp_for_each_subflow(): this is clearer now that the _safe version is used in multiple places. Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 13 9月, 2022 1 次提交
-
-
由 Paolo Abeni 提交于
The intel bot reported a memory accounting related splat: [ 240.473094] ------------[ cut here ]------------ [ 240.478507] page_counter underflow: -4294828518 nr_pages=4294967290 [ 240.485500] WARNING: CPU: 2 PID: 14986 at mm/page_counter.c:56 page_counter_cancel+0x96/0xc0 [ 240.570849] CPU: 2 PID: 14986 Comm: mptcp_connect Tainted: G S 5.19.0-rc4-00739-gd24141fe #1 [ 240.581637] Hardware name: HP HP Z240 SFF Workstation/802E, BIOS N51 Ver. 01.63 10/05/2017 [ 240.590600] RIP: 0010:page_counter_cancel+0x96/0xc0 [ 240.596179] Code: 00 00 00 45 31 c0 48 89 ef 5d 4c 89 c6 41 5c e9 40 fd ff ff 4c 89 e2 48 c7 c7 20 73 39 84 c6 05 d5 b1 52 04 01 e8 e7 95 f3 01 <0f> 0b eb a9 48 89 ef e8 1e 25 fc ff eb c3 66 66 2e 0f 1f 84 00 00 [ 240.615639] RSP: 0018:ffffc9000496f7c8 EFLAGS: 00010082 [ 240.621569] RAX: 0000000000000000 RBX: ffff88819c9c0120 RCX: 0000000000000000 [ 240.629404] RDX: 0000000000000027 RSI: 0000000000000004 RDI: fffff5200092deeb [ 240.637239] RBP: ffff88819c9c0120 R08: 0000000000000001 R09: ffff888366527a2b [ 240.645069] R10: ffffed106cca4f45 R11: 0000000000000001 R12: 00000000fffffffa [ 240.652903] R13: ffff888366536118 R14: 00000000fffffffa R15: ffff88819c9c0000 [ 240.660738] FS: 00007f3786e72540(0000) GS:ffff888366500000(0000) knlGS:0000000000000000 [ 240.669529] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.675974] CR2: 00007f966b346000 CR3: 0000000168cea002 CR4: 00000000003706e0 [ 240.683807] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 240.691641] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 240.699468] Call Trace: [ 240.702613] <TASK> [ 240.705413] page_counter_uncharge+0x29/0x80 [ 240.710389] drain_stock+0xd0/0x180 [ 240.714585] refill_stock+0x278/0x580 [ 240.718951] __sk_mem_reduce_allocated+0x222/0x5c0 [ 240.729248] __mptcp_update_rmem+0x235/0x2c0 [ 240.734228] __mptcp_move_skbs+0x194/0x6c0 [ 240.749764] mptcp_recvmsg+0xdfa/0x1340 [ 240.763153] inet_recvmsg+0x37f/0x500 [ 240.782109] sock_read_iter+0x24a/0x380 [ 240.805353] new_sync_read+0x420/0x540 [ 240.838552] vfs_read+0x37f/0x4c0 [ 240.842582] ksys_read+0x170/0x200 [ 240.864039] do_syscall_64+0x5c/0x80 [ 240.872770] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 240.878526] RIP: 0033:0x7f3786d9ae8e [ 240.882805] Code: c0 e9 b6 fe ff ff 50 48 8d 3d 6e 18 0a 00 e8 89 e8 01 00 66 0f 1f 84 00 00 00 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 0f 05 <48> 3d 00 f0 ff ff 77 5a c3 66 0f 1f 84 00 00 00 00 00 48 83 ec 28 [ 240.902259] RSP: 002b:00007fff7be81e08 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 240.910533] RAX: ffffffffffffffda RBX: 0000000000002000 RCX: 00007f3786d9ae8e [ 240.918368] RDX: 0000000000002000 RSI: 00007fff7be87ec0 RDI: 0000000000000005 [ 240.926206] RBP: 0000000000000005 R08: 00007f3786e6a230 R09: 00007f3786e6a240 [ 240.934046] R10: fffffffffffff288 R11: 0000000000000246 R12: 0000000000002000 [ 240.941884] R13: 00007fff7be87ec0 R14: 00007fff7be87ec0 R15: 0000000000002000 [ 240.949741] </TASK> [ 240.952632] irq event stamp: 27367 [ 240.956735] hardirqs last enabled at (27366): [<ffffffff81ba50ea>] mem_cgroup_uncharge_skmem+0x6a/0x80 [ 240.966848] hardirqs last disabled at (27367): [<ffffffff81b8fd42>] refill_stock+0x282/0x580 [ 240.976017] softirqs last enabled at (27360): [<ffffffff83a4d8ef>] mptcp_recvmsg+0xaf/0x1340 [ 240.985273] softirqs last disabled at (27364): [<ffffffff83a4d30c>] __mptcp_move_skbs+0x18c/0x6c0 [ 240.994872] ---[ end trace 0000000000000000 ]--- After commit d24141fe ("mptcp: drop SK_RECLAIM_* macros"), if rmem_fwd_alloc become negative, mptcp_rmem_uncharge() can try to reclaim a negative amount of pages, since the expression: reclaimable >= PAGE_SIZE will evaluate to true for any negative value of the int 'reclaimable': 'PAGE_SIZE' is an unsigned long and the negative integer will be promoted to a (very large) unsigned long value. Still after the mentioned commit, kfree_skb_partial() in mptcp_try_coalesce() will reclaim most of just released fwd memory, so that following charging of the skb delta size will lead to negative fwd memory values. At that point a racing recvmsg() can trigger the splat. Address the issue switching the order of the memory accounting operations. The fwd memory can still transiently reach negative values, but that will happen in an atomic scope and no code path could touch/use such value. Reported-by: Nkernel test robot <oliver.sang@intel.com> Fixes: d24141fe ("mptcp: drop SK_RECLAIM_* macros") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Link: https://lore.kernel.org/r/20220906180404.1255873-1-matthieu.baerts@tessares.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 24 8月, 2022 1 次提交
-
-
由 Kuniyuki Iwashima 提交于
While reading sysctl_max_skb_frags, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 5f74f82e ("net:Add sysctl_max_skb_frags") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 8月, 2022 2 次提交
-
-
由 Paolo Abeni 提交于
Dipanjan reported a syzbot splat at close time: WARNING: CPU: 1 PID: 10818 at net/ipv4/af_inet.c:153 inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153 Modules linked in: uio_ivshmem(OE) uio(E) CPU: 1 PID: 10818 Comm: kworker/1:16 Tainted: G OE 5.19.0-rc6-g2eae0556bb9d #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: events mptcp_worker RIP: 0010:inet_sock_destruct+0x6d0/0x8e0 net/ipv4/af_inet.c:153 Code: 21 02 00 00 41 8b 9c 24 28 02 00 00 e9 07 ff ff ff e8 34 4d 91 f9 89 ee 4c 89 e7 e8 4a 47 60 ff e9 a6 fc ff ff e8 20 4d 91 f9 <0f> 0b e9 84 fe ff ff e8 14 4d 91 f9 0f 0b e9 d4 fd ff ff e8 08 4d RSP: 0018:ffffc9001b35fa78 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000002879d0 RCX: ffff8881326f3b00 RDX: 0000000000000000 RSI: ffff8881326f3b00 RDI: 0000000000000002 RBP: ffff888179662674 R08: ffffffff87e983a0 R09: 0000000000000000 R10: 0000000000000005 R11: 00000000000004ea R12: ffff888179662400 R13: ffff888179662428 R14: 0000000000000001 R15: ffff88817e38e258 FS: 0000000000000000(0000) GS:ffff8881f5f00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020007bc0 CR3: 0000000179592000 CR4: 0000000000150ee0 Call Trace: <TASK> __sk_destruct+0x4f/0x8e0 net/core/sock.c:2067 sk_destruct+0xbd/0xe0 net/core/sock.c:2112 __sk_free+0xef/0x3d0 net/core/sock.c:2123 sk_free+0x78/0xa0 net/core/sock.c:2134 sock_put include/net/sock.h:1927 [inline] __mptcp_close_ssk+0x50f/0x780 net/mptcp/protocol.c:2351 __mptcp_destroy_sock+0x332/0x760 net/mptcp/protocol.c:2828 mptcp_worker+0x5d2/0xc90 net/mptcp/protocol.c:2586 process_one_work+0x9cc/0x1650 kernel/workqueue.c:2289 worker_thread+0x623/0x1070 kernel/workqueue.c:2436 kthread+0x2e9/0x3a0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:302 </TASK> The root cause of the problem is that an mptcp-level (re)transmit can race with mptcp_close() and the packet scheduler checks the subflow state before acquiring the socket lock: we can try to (re)transmit on an already closed ssk. Fix the issue checking again the subflow socket status under the subflow socket lock protection. Additionally add the missing check for the fallback-to-tcp case. Fixes: d5f49190 ("mptcp: allow picking different xmit subflows") Reported-by: NDipanjan Das <mail.dipanjan.das@gmail.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
If the mptcp socket creation fails due to a CGROUP_INET_SOCK_CREATE eBPF program, the MPTCP protocol ends-up leaking all the subflows: the related cleanup happens in __mptcp_destroy_sock() that is not invoked in such code path. Address the issue moving the subflow sockets cleanup in the mptcp_destroy_common() helper, which is invoked in every msk cleanup path. Additionally get rid of the intermediate list_splice_init step, which is an unneeded relic from the past. The issue is present since before the reported root cause commit, but any attempt to backport the fix before that hash will require a complete rewrite. Fixes: e16163b6 ("mptcp: refactor shutdown and close") Reported-by: NNguyen Dinh Phi <phind.uet@gmail.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Co-developed-by: NNguyen Dinh Phi <phind.uet@gmail.com> Signed-off-by: NNguyen Dinh Phi <phind.uet@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 7月, 2022 1 次提交
-
-
由 Kuniyuki Iwashima 提交于
While reading these sysctl variables, they can be changed concurrently. Thus, we need to add READ_ONCE() to their readers. - .sysctl_rmem - .sysctl_rwmem - .sysctl_rmem_offset - .sysctl_wmem_offset - sysctl_tcp_rmem[1, 2] - sysctl_tcp_wmem[1, 2] - sysctl_decnet_rmem[1] - sysctl_decnet_wmem[1] - sysctl_tipc_rmem[1] Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2022 1 次提交
-
-
由 Kuniyuki Iwashima 提交于
While reading sysctl_tcp_moderate_rcvbuf, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 7月, 2022 1 次提交
-
-
由 Paolo Abeni 提交于
The in-kernel PM has a bit of duplicate code related to ack generation. Create a new helper factoring out the PM-specific needs and use it in a couple of places. As a bonus, mptcp_subflow_send_ack() is not used anymore outside its own compilation unit and can become static. Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 11 7月, 2022 1 次提交
-
-
由 Paolo Abeni 提交于
At disconnect time the MPTCP protocol traverse the subflows list closing each of them. In some circumstances - MPJ subflow, passive MPTCP socket, the latter operation can remove the subflow from the list, invalidating the current iterator. Address the issue using the safe list traversing helper variant. Reported-by: Nvan fantasy <g1042620637@gmail.com> Fixes: b29fcfb5 ("mptcp: full disconnect implementation") Tested-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2022 1 次提交
-
-
由 Mat Martineau 提交于
When setting up a subflow's flags for sending MP_PRIO MPTCP options, the subflow socket lock was not held while reading and modifying several struct members that are also read and modified in mptcp_write_options(). Acquire the subflow socket lock earlier and send the MP_PRIO ACK with that lock already acquired. Add a new variant of the mptcp_subflow_send_ack() helper to use with the subflow lock held. Fixes: 06706542 ("mptcp: add the outgoing MP_PRIO support") Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2022 3 次提交
-
-
由 Paolo Abeni 提交于
Similar to commit 7c80b038 ("net: fix sk_wmem_schedule() and sk_rmem_schedule() errors"), let the MPTCP receive path schedule exactly the required amount of memory. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
After commit 4890b686 ("net: keep sk->sk_forward_alloc as small as possible"), the MPTCP protocol is the last SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD users. Update the MPTCP reclaim schema to match the core/TCP one and drop the mentioned macros. This additionally clean the MPTCP code a bit. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
The memory accounting is broken in such exceptional code path, and after commit 4890b686 ("net: keep sk->sk_forward_alloc as small as possible") we can't find much help there. Drop the broken code. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2022 3 次提交
-
-
由 Paolo Abeni 提交于
When the listener socket owning the relevant request is closed, it frees the unaccepted subflows and that causes later deletion of the paired MPTCP sockets. The mptcp socket's worker can run in the time interval between such delete operations. When that happens, any access to msk->first will cause an UaF access, as the subflow cleanup did not cleared such field in the mptcp socket. Address the issue explicitly traversing the listener socket accept queue at close time and performing the needed cleanup on the pending msk. Note that the locking is a bit tricky, as we need to acquire the msk socket lock, while still owning the subflow socket one. Fixes: 86e39e04 ("mptcp: keep track of local endpoint still available for each msk") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
If the MPTCP socket shutdown happens before a fallback to TCP, and all the pending data have been already spooled, we never close the TCP connection. Address the issue explicitly checking for critical condition at fallback time. Fixes: 1e39e5a3 ("mptcp: infinite mapping sending") Fixes: 0348c690 ("mptcp: add the fallback check") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
mptcp_mp_fail_no_response shouldn't be invoked on each worker run, it should be invoked only when MP_FAIL response timeout occurs. This patch refactors the MP_FAIL response logic. It leverages the fact that only the MPC/first subflow can gracefully fail to avoid unneeded subflows traversal: the failing subflow can be only msk->first. A new 'fail_tout' field is added to the subflow context to record the MP_FAIL response timeout and use such field to reliably share the timeout timer between the MP_FAIL event and the MPTCP socket close timeout. Finally, a new ack is generated to send out MP_FAIL notification as soon as we hit the relevant condition, instead of waiting a possibly unbound time for the next data packet. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/281 Fixes: d9fb7970 ("mptcp: Do not traverse the subflow connection list without lock") Co-developed-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 11 6月, 2022 3 次提交
-
-
由 Eric Dumazet 提交于
Currently, tcp_memory_allocated can hit tcp_mem[] limits quite fast. Each TCP socket can forward allocate up to 2 MB of memory, even after flow became less active. 10,000 sockets can have reserved 20 GB of memory, and we have no shrinker in place to reclaim that. Instead of trying to reclaim the extra allocations in some places, just keep sk->sk_forward_alloc values as small as possible. This should not impact performance too much now we have per-cpu reserves: Changes to tcp_memory_allocated should not be too frequent. For sockets not using SO_RESERVE_MEM: - idle sockets (no packets in tx/rx queues) have zero forward alloc. - non idle sockets have a forward alloc smaller than one page. Note: - Removal of SK_RECLAIM_CHUNK and SK_RECLAIM_THRESHOLD is left to MPTCP maintainers as a follow up. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
Each protocol having a ->memory_allocated pointer gets a corresponding per-cpu reserve, that following patches will use. Instead of having reserved bytes per socket, we want to have per-cpu reserves. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
Due to memcg interface, SK_MEM_QUANTUM is effectively PAGE_SIZE. This might change in the future, but it seems better to avoid the confusion. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 20 5月, 2022 1 次提交
-
-
由 Mat Martineau 提交于
The MPTCP socket's conn_list (list of subflows) requires the socket lock to access. The MP_FAIL timeout code added such an access, where it would check the list of subflows both in timer context and (later) in workqueue context where the socket lock is held. Rather than check the list twice, remove the check in the timeout handler and only depend on the check in the workqueue. Also remove the MPTCP_FAIL_NO_RESPONSE flag, since mptcp_mp_fail_no_response() has insignificant overhead and can be checked on each worker run. Fixes: 49fa1919 ("mptcp: reset subflow when MP_FAIL doesn't respond") Reported-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 5月, 2022 1 次提交
-
-
由 Paolo Abeni 提交于
This reverts commit 4293248c. Additional locks are not needed, all the touched sections are already under mptcp socket lock protection. Fixes: 4293248c ("mptcp: add data lock for sk timers") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 5月, 2022 4 次提交
-
-
由 Paolo Abeni 提交于
As per RFC, the offered MPTCP-level window should never shrink. While we currently track the right edge, we don't enforce the above constraint on the wire. Additionally, concurrent xmit on different subflows can end-up in erroneous right edge update. Address the above explicitly updating the announced window and protecting the update with an additional atomic operation (sic) Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Bump a counter for counter when snd_wnd is shared among subflow, for observability's sake. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
As per RFC, mptcp subflows use a "shared" snd_wnd: the effective window is the maximum among the current values received on all subflows. Without such feature a data transfer using multiple subflows could block. Window sharing is currently implemented in the RX side: __tcp_select_window uses the mptcp-level receive buffer to compute the announced window. That is not enough: the TCP stack will stick to the window size received on the given subflow; we need to propagate the msk window value on each subflow at xmit time. Change the packet scheduler to ignore the subflow level window and use instead the msk level one Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Jakub Kicinski 提交于
Switch net callers to the new API not requiring the NAPI_POLL_WEIGHT argument. Acked-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAlex Elder <elder@linaro.org> Acked-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Acked-by: NAlexandra Winter <wintera@linux.ibm.com> Link: https://lore.kernel.org/r/20220504163725.550782-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 04 5月, 2022 2 次提交
-
-
由 Kishen Maloor 提交于
This change adds an internal function to store/retrieve local addrs announced by userspace PM implementations to/from its kernel context. The function addresses the requirements of three scenarios: 1) ADD_ADDR announcements (which require that a local id be provided), 2) retrieving the local id associated with an address, and also where one may need to be assigned, and 3) reissuance of ADD_ADDRs when there's a successful match of addr/id. The list of all stored local addr entries is held under the MPTCP sock structure. Memory for these entries is allocated from the sock option buffer, so the list of addrs is bounded by optmem_max. The list if not released via REMOVE_ADDR signals is ultimately freed when the sock is destructed. Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NKishen Maloor <kishen.maloor@intel.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kishen Maloor 提交于
This change updates internal logic to permit subflows to be established from either the client or server ends of MPTCP connections. This symmetry and added flexibility may be harnessed by PM implementations running on either end in creating new subflows. The essence of this change lies in not relying on the "server_side" flag (which continues to be available if needed). Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NKishen Maloor <kishen.maloor@intel.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 27 4月, 2022 3 次提交
-
-
由 Geliang Tang 提交于
This patch adds a new msk->flags bit MPTCP_FAIL_NO_RESPONSE, then reuses sk_timer to trigger a check if we have not received a response from the peer after sending MP_FAIL. If the peer doesn't respond properly, reset the subflow. Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geliang Tang 提交于
mptcp_data_lock() needs to be held when manipulating the msk retransmit_timer or the sk sk_timer. This patch adds the data lock for the both timers. Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Geliang Tang 提交于
Use the helper mptcp_stop_timer() instead of using sk_stop_timer() to stop icsk_retransmit_timer directly. Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 4月, 2022 1 次提交
-
-
由 Geliang Tang 提交于
This patch adds a new mib named MPTCP_MIB_INFINITEMAPTX, increase it when a infinite mapping has been sent out. Signed-off-by: NGeliang Tang <geliang.tang@suse.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-