- 23 1月, 2021 2 次提交
-
-
由 Paolo Abeni 提交于
After commit 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks"), MPTCP never sets the flag bit SOCK_NOSPACE on its subflow. As a side effect, autotune never takes place, as it happens inside tcp_new_space(), which in turn is called only when the mentioned bit is set. Let's sendmsg() set the subflows NOSPACE bit when looking for more memory and use the subflow write_space callback to propagate the snd buf update and wake-up the user-space. Additionally, this allows dropping a bunch of duplicate code and makes the SNDBUF_LIMITED chrono relevant again for MPTCP subflows. Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Currently, incoming subflows link to the parent socket, while outgoing ones link to a per subflow socket. The latter is not really needed, except at the initial connect() time and for the first subflow. Always graft the outgoing subflow to the parent socket and free the unneeded ones early. This allows some code cleanup, reduces the amount of memory used and will simplify the next patch Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 1月, 2021 1 次提交
-
-
由 Paolo Abeni 提交于
tcp_disconnect() expects the caller acquires the sock lock, but mptcp_disconnect() is not doing that. Add the missing required lock. Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Fixes: 76e2a55d ("mptcp: better msk-level shutdown.") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/f818e82b58a556feeb71dcccc8bf1c87aafc6175.1610638176.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 13 1月, 2021 2 次提交
-
-
由 Paolo Abeni 提交于
Instead of re-implementing most of inet_shutdown, re-use such helper, and implement the MPTCP-specific bits at the 'proto' level. The msk-level disconnect() can now be invoked, lets provide a suitable implementation. As a side effect, this fixes bad state management for listener sockets. The latter could lead to division by 0 oops since commit ea4ca586 ("mptcp: refine MPTCP-level ack scheduling"). Fixes: 43b54c6e ("mptcp: Use full MPTCP-level disconnect state machine") Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Syzkaller found a way to trigger division by zero in mptcp_subflow_cleanup_rbuf(). The current checks implemented into tcp_can_send_ack() are too week, let's be more accurate. Reported-by: NChristoph Paasch <cpaasch@apple.com> Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling") Fixes: fd897679 ("mptcp: be careful on MPTCP-level ack.") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 29 12月, 2020 1 次提交
-
-
由 Davide Caratti 提交于
the following syzkaller reproducer: r0 = socket$inet_mptcp(0x2, 0x1, 0x106) bind$inet(r0, &(0x7f0000000080)={0x2, 0x4e24, @multicast2}, 0x10) connect$inet(r0, &(0x7f0000000480)={0x2, 0x4e24, @local}, 0x10) sendto$inet(r0, &(0x7f0000000100)="f6", 0xffffffe7, 0xc000, 0x0, 0x0) systematically triggers the following warning: WARNING: CPU: 2 PID: 8618 at net/core/stream.c:208 sk_stream_kill_queues+0x3fa/0x580 Modules linked in: CPU: 2 PID: 8618 Comm: syz-executor Not tainted 5.10.0+ #334 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/04 RIP: 0010:sk_stream_kill_queues+0x3fa/0x580 Code: df 48 c1 ea 03 0f b6 04 02 84 c0 74 04 3c 03 7e 40 8b ab 20 02 00 00 e9 64 ff ff ff e8 df f0 81 2 RSP: 0018:ffffc9000290fcb0 EFLAGS: 00010293 RAX: ffff888011cb8000 RBX: 0000000000000000 RCX: ffffffff86eecf0e RDX: 0000000000000000 RSI: ffffffff86eecf6a RDI: 0000000000000005 RBP: 0000000000000e28 R08: ffff888011cb8000 R09: fffffbfff1f48139 R10: ffffffff8fa409c7 R11: fffffbfff1f48138 R12: ffff8880215e6220 R13: ffffffff8fa409c0 R14: ffffc9000290fd30 R15: 1ffff92000521fa2 FS: 00007f41c78f4800(0000) GS:ffff88802d000000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f95c803d088 CR3: 0000000025ed2000 CR4: 00000000000006f0 Call Trace: __mptcp_destroy_sock+0x4f5/0x8e0 mptcp_close+0x5e2/0x7f0 inet_release+0x12b/0x270 __sock_release+0xc8/0x270 sock_close+0x18/0x20 __fput+0x272/0x8e0 task_work_run+0xe0/0x1a0 exit_to_user_mode_prepare+0x1df/0x200 syscall_exit_to_user_mode+0x19/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xa9 userspace programs provide arbitrarily high values of 'len' in sendmsg(): this is causing integer overflow of 'amount'. Cap forward allocation to 1 megabyte: higher values are not really useful. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Fixes: e93da928 ("mptcp: implement wmem reservation") Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/3334d00d8b2faecafdfab9aa593efcbf61442756.1608584474.git.dcaratti@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 18 12月, 2020 4 次提交
-
-
由 Paolo Abeni 提交于
When sendmsg() needs to wait for memory, the pending data is not updated. That causes a drift in forward memory allocation, leading to stall and/or warnings at socket close time. This change addresses the above issue moving the pending data counter update inside the sendmsg() main loop. Fixes: 6e628cd3 ("mptcp: use mptcp release_cb for delayed tasks") Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
When multiple subflows are active, we can receive a window update on subflow with no write space available. MPTCP will try to push frames on such subflow and will fail. Pending frames will be pushed only after receiving a window update on a subflow with some wspace available. Overall the above could lead to suboptimal aggregate bandwidth usage. Instead, we should try to push pending frames as soon as the subflow reaches both conditions mentioned above. We can finally enable self-tests with asymmetric links, as the above makes them finally pass. Fixes: 6f8a612a ("mptcp: keep track of advertised windows right edge") Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
MPTCP closes the subflows while holding the msk-level lock. While acquiring the subflow socket lock we need to use the correct nested annotation, or we can hit a lockdep splat at runtime. Reported-and-tested-by: NGeliang Tang <geliangtang@gmail.com> Fixes: e16163b6 ("mptcp: refactor shutdown and close") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Currently MPTCP is not propagating the security context from the ingress request socket to newly created msk at clone time. Address the issue invoking the missing security helper. Fixes: cf7da0d6 ("mptcp: Create SUBFLOW socket for incoming connections") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 12月, 2020 2 次提交
-
-
由 Paolo Abeni 提交于
Currently the xmit path of the MPTCP protocol creates smaller- than-max-size skbs, which is suboptimal for the performances. There are a few things to improve: - when coalescing to an existing skb, must clear the PUSH flag - tcp_build_frag() expect the available space as an argument. When coalescing is enable MPTCP already subtracted the to-be-coalesced skb len. We must increment said argument accordingly. Before: ./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM [...] 131072 16384 16384 30.00 24414.86 After: ./use_mptcp.sh netperf -H 127.0.0.1 -t TCP_STREAM [...] 131072 16384 16384 30.05 28357.69 Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
parse the MPTCP FASTCLOSE subtype. If provided key matches the local one, schedule the work queue to close (with tcp reset) all subflows. The MPTCP socket moves to closed state immediately. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 10 12月, 2020 2 次提交
-
-
由 Paolo Abeni 提交于
When the workqueue disposes of the msk, the subflows can still receive some data from the peer after __mptcp_close_ssk() completes. The above could trigger a race between the msk receive path and the msk destruction. Acquiring the mptcp_data_lock() in __mptcp_destroy_sock() will not save the day: the rx path could be reached even after msk destruction completes. Instead use the subflow 'disposable' flag to prevent entering the msk receive path after __mptcp_close_ssk(). Fixes: e16163b6 ("mptcp: refactor shutdown and close") Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
Christoph reported the following splat: WARNING: CPU: 0 PID: 4615 at net/ipv4/inet_connection_sock.c:1031 inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Modules linked in: CPU: 0 PID: 4615 Comm: syz-executor.4 Not tainted 5.9.0 #37 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:inet_csk_listen_stop+0x8e8/0xad0 net/ipv4/inet_connection_sock.c:1031 Code: 03 00 00 00 e8 79 b2 3d ff e9 ad f9 ff ff e8 1f 76 ba fe be 02 00 00 00 4c 89 f7 e8 62 b2 3d ff e9 14 f9 ff ff e8 08 76 ba fe <0f> 0b e9 97 f8 ff ff e8 fc 75 ba fe be 03 00 00 00 4c 89 f7 e8 3f RSP: 0018:ffffc900037f7948 EFLAGS: 00010293 RAX: ffff88810a349c80 RBX: ffff888114ee1b00 RCX: ffffffff827b14cd RDX: 0000000000000000 RSI: ffffffff827b1c38 RDI: 0000000000000005 RBP: ffff88810a2a8000 R08: ffff88810a349c80 R09: fffff520006fef1f R10: 0000000000000003 R11: fffff520006fef1e R12: ffff888114ee2d00 R13: dffffc0000000000 R14: 0000000000000001 R15: ffff888114ee1d68 FS: 00007f2ac1945700(0000) GS:ffff88811b400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd44798bc0 CR3: 0000000109810002 CR4: 0000000000170ef0 Call Trace: __tcp_close+0xd86/0x1110 net/ipv4/tcp.c:2433 __mptcp_close_ssk+0x256/0x430 net/mptcp/protocol.c:1761 __mptcp_destroy_sock+0x49b/0x770 net/mptcp/protocol.c:2127 mptcp_close+0x62d/0x910 net/mptcp/protocol.c:2184 inet_release+0xe9/0x1f0 net/ipv4/af_inet.c:434 __sock_release+0xd2/0x280 net/socket.c:596 sock_close+0x15/0x20 net/socket.c:1277 __fput+0x276/0x960 fs/file_table.c:281 task_work_run+0x109/0x1d0 kernel/task_work.c:151 get_signal+0xe8f/0x1d40 kernel/signal.c:2561 arch_do_signal+0x88/0x1b60 arch/x86/kernel/signal.c:811 exit_to_user_mode_loop kernel/entry/common.c:161 [inline] exit_to_user_mode_prepare+0x9b/0xf0 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x22/0x150 kernel/entry/common.c:266 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x7f2ac1254469 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48 RSP: 002b:00007f2ac1944dc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffbf RBX: 000000000069bf00 RCX: 00007f2ac1254469 RDX: 0000000000000000 RSI: 0000000000008982 RDI: 0000000000000003 RBP: 000000000069bf00 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000000000069bf0c R13: 00007ffeb53f178f R14: 00000000004668b0 R15: 0000000000000003 After commit 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list"), the msk's workqueue and/or PM can touch the MPC subflow - and acquire its socket lock - even if it's still unaccepted. If the above event races with the relevant listener socket close, we can end-up with the above splat. This change addresses the issue delaying the MPC socket insertion in conn_list at accept time - that is, partially reverting the blamed commit. We must additionally ensure that mptcp_pm_fully_established() happens after accept() time, or the PM will not be able to handle properly such event - conn_list could be empty otherwise. In the receive path, we check the subflow list node to ensure it is out of the listener queue. Be sure client subflows do not match transiently such condition moving them into the join list earlier at creation time. Since we now have multiple mptcp_pm_fully_established() call sites from different code-paths, said helper can now race with itself. Use an additional PM status bit to avoid multiple notifications. Reported-by: NChristoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/103 Fixes: 0397c6d8 ("mptcp: keep unaccepted MPC subflow into join list"), Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 12月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
If a packet is ready in receive queue, and application isssues a recvmsg()/recvfrom()/recvmmsg() request asking for zero bytes, we hang in mptcp_recvmsg(). Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: NEric Dumazet <edumazet@google.com> Tested-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Link: https://lore.kernel.org/r/20201202171657.1185108-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 01 12月, 2020 5 次提交
-
-
由 Paolo Abeni 提交于
We have some tasks triggered by the subflow receive path which require to access the msk socket status, specifically: mptcp_clean_una() and mptcp_push_pending() We have almost everything in place to defer to the msk release_cb such tasks when the msk sock is owned. Since the worker is no more used to clean the acked data, for fallback sockets we need to explicitly flush them. As an added bonus we can move the wake-up code in __mptcp_clean_una(), simplify a lot mptcp_poll() and move the timer update under the data lock. The worker is now used only to process and send DATA_FIN packets and do the mptcp-level retransmissions. Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Extending the data_lock scope in mptcp_incoming_option we can use that to protect both snd_una and wnd_end. In the typical case, we will have a single atomic op instead of 2 Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Move the TX skbs allocation in mptcp_sendmsg() scope, and tentatively pre-allocate a skbs number proportional to the sendmsg() length. Use the ssk tx skb cache to prevent the subflow allocation. This allows removing the msk skb extension cache and will make possible the later patches. Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Such spinlock is currently used only to protect the 'owned' flag inside the socket lock itself. With this patch, we extend its scope to protect the whole msk receive path and sk_forward_memory. Given the above, we can always move data into the msk receive queue (and OoO queue) from the subflow. We leverage the previous commit, so that we need to acquire the spinlock in the tx path only when moving fwd memory. recvmsg() must now explicitly acquire the socket spinlock when moving skbs out of sk_receive_queue. To reduce the number of lock operations required we use a second rx queue and splice the first into the latter in mptcp_lock_sock(). Additionally rmem allocated memory is bulk-freed via release_cb() Acked-by: NFlorian Westphal <fw@strlen.de> Co-developed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
This leverages the previous commit to reserve the wmem required for the sendmsg() operation when the msk socket lock is first acquired. Some heuristics are used to get a reasonable [over] estimation of the whole memory required. If we can't forward alloc such amount fallback to a reasonable small chunk, otherwise enter the wait for memory path. When sendmsg() needs more memory it looks at wmem_reserved first and if that is exhausted, move more space from sk_forward_alloc. The reserved memory is not persistent and is released at the next socket unlock via the release_cb(). Overall this will simplify the next patch. Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 26 11月, 2020 2 次提交
-
-
由 Paolo Abeni 提交于
We can enter the main mptcp_recvmsg() loop even when no subflows are connected. As note by Eric, that would result in a divide by zero oops on ack generation. Address the issue by checking the subflow status before sending the ack. Additionally protect mptcp_recvmsg() against invocation with weird socket states. v1 -> v2: - removed unneeded inline keyword - Jakub Reported-and-suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Fixes: ea4ca586 ("mptcp: refine MPTCP-level ack scheduling") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/5370c0ae03449239e3d1674ddcfb090cf6f20abe.1606253206.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
On close this timer might be scheduled. mptcp uses sk_reset_timer for this, so the a reference on the mptcp socket is taken. This causes a refcount leak which can for example be reproduced with 'mp_join_server_v4.pkt' from the mptcp-packetdrill repo. The leak has nothing to do with join requests, v1_mp_capable_bind_no_cs.pkt works too when replacing the last ack mpcapable to v1 instead of v0. unreferenced object 0xffff888109bba040 (size 2744): comm "packetdrill", [..] backtrace: [..] sk_prot_alloc.isra.0+0x2b/0xc0 [..] sk_clone_lock+0x2f/0x740 [..] mptcp_sk_clone+0x33/0x1a0 [..] subflow_syn_recv_sock+0x2b1/0x690 [..] Fixes: e16163b6 ("mptcp: refactor shutdown and close") Cc: Davide Caratti <dcaratti@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Acked-by: NPaolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20201124162446.11448-1-fw@strlen.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 21 11月, 2020 7 次提交
-
-
由 Paolo Abeni 提交于
Send timely MPTCP-level ack is somewhat difficult when the insertion into the msk receive level is performed by the worker. It needs TCP-level dup-ack to notify the MPTCP-level ack_seq increase, as both the TCP-level ack seq and the rcv window are unchanged. We can actually avoid processing incoming data with the worker, and let the subflow or recevmsg() send ack as needed. When recvmsg() moves the skbs inside the msk receive queue, the msk space is still unchanged, so tcp_cleanup_rbuf() could end-up skipping TCP-level ack generation. Anyway, when __mptcp_move_skbs() is invoked, a known amount of bytes is going to be consumed soon: we update rcv wnd computation taking them in account. Additionally we need to explicitly trigger tcp_cleanup_rbuf() when recvmsg() consumes a significant amount of the receive buffer. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
OoO handling attempts to detect when packet is out-of-window by testing current ack sequence and remaining space vs. sequence number. This doesn't work reliably. Store the highest allowed sequence number that we've announced and use it to detect oow packets. Do this when mptcp options get written to the packet (wire format). For this to work we need to move the write_options call until after stack selected a new tcp window. Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Geliang Tang 提交于
When ADD_ADDR suboption includes an IPv6 address, the size is 28 octets. It will not fit when other MPTCP suboptions are included in this packet, e.g. DSS. So here we send out an ADD_ADDR dedicated packet to carry only ADD_ADDR suboption, no other MPTCP suboptions. In mptcp_pm_announce_addr, we check whether this is an IPv6 ADD_ADDR. If it is, we set the flag MPTCP_ADD_ADDR_IPV6 to true. Then we call mptcp_pm_nl_add_addr_send_ack to sent out a new pure ACK packet. In mptcp_established_options_add_addr, we check whether this is a pure ACK packet for ADD_ADDR. If it is, we drop all other MPTCP suboptions in this packet, only put ADD_ADDR suboption in it. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
This will simplify all operation dealing with subflows before accept time (e.g. data fin processing, add_addr). The join list is already flushed by mptcp_stream_accept() before returning the newly created msk to the user space. This also fixes an potential bug present into the old code: conn_list was manipulated without helding the msk lock in mptcp_stream_accept(). Tested-by: NGeliang Tang <geliangtang@gmail.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
In case a subflow path is blocked, MPTCP-level retransmit may not take place anymore because such subflow is likely to have unacked data left in its write queue. Ignore subflows that have experienced loss and test next candidate. Fixes: 3b1d6210 ("mptcp: implement and use MPTCP-level retransmission") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
We need to cope with some more state transition for fallback sockets, or could still end-up moving to TCP_CLOSE too early and avoid spooling some pending data Fixes: e16163b6 ("mptcp: refactor shutdown and close") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Only mptcp_close() can actually cancel the workqueue, no need to add and use this flag. Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 20 11月, 2020 1 次提交
-
-
由 Paolo Abeni 提交于
We must start the retransmission timer only there are pending data in the rtx queue. Otherwise we can hit a WARN_ON in mptcp_reset_timer(), as syzbot demonstrated. Reported-and-tested-by: syzbot+42aa53dafb66a07e5a24@syzkaller.appspotmail.com Fixes: d9ca1de8 ("mptcp: move page frag allocation in mptcp_sendmsg()") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Tested-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Link: https://lore.kernel.org/r/1a72039f112cae048c44d398ffa14e0a1432db3d.1605737083.git.pabeni@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 17 11月, 2020 10 次提交
-
-
由 Paolo Abeni 提交于
When the worker moves some bytes from the OoO queue into the receive queue, the msk->ask_seq is updated, the MPTCP-level ack carrying that value needs to wait the next ingress packet, possibly slowing down or hanging the peer Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
Before sending 'x' new bytes also check that the new snd_una would be within the permitted receive window. For every ACK that also contains a DSS ack, check whether its tcp-level receive window would advance the current mptcp window right edge and update it if so. Signed-off-by: NFlorian Westphal <fw@strlen.de> Co-developed-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Florian Westphal 提交于
MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at least one subflow and the mptcp socket itself are writeable. mptcp_poll returns EPOLLOUT if the bit is set. mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write has used up all subflows or the mptcp socket wmem. This reworks nospace handling as follows: MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning. This bit is set when the mptcp socket is not writeable. The mptcp-level ack path schedule will then schedule the mptcp worker to allow it to free already-acked data (and reduce wmem usage). This will then wake userspace processes that wait for a POLLOUT event. sendmsg will set MPTCP_NOSPACE only when it has to wait for more wmem (blocking I/O case). poll path will set MPTCP_NOSPACE in case the mptcp socket is not writeable. Normal tcp-level notification (SOCK_NOSPACE) is only enabled in case the subflow socket has no available wmem. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
After the previous patch we may end-up with unsent data in the write buffer. If such buffer is full, the writer will block for unlimited time. We need to trigger the MPTCP xmit path even for the subflow rx path, on MPTCP snd_una updates. Keep things simple and just schedule the work queue if needed. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
mptcp_sendmsg() is refactored so that first it copies the data provided from user space into the send queue, and then tries to spool the send queue via sendmsg_frag. There a subtle change in the mptcp level collapsing on consecutive data fragment: we now allow that only on unsent data. The latter don't need to deal with msghdr data anymore and can be simplified in a relevant way. snd_nxt and write_seq are now tracked independently. Overall this allows some relevant cleanup and will allow sending pending mptcp data on msk una update in later patch. Co-developed-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
We must not close the subflows before all the MPTCP level data, comprising the DATA_FIN has been acked at the MPTCP level, otherwise we could be unable to retransmit as needed. __mptcp_wr_shutdown() shutdown is responsible to check for the correct status and close all subflows. Is called by the output path after spooling any data and at shutdown/close time. In a similar way, __mptcp_destroy_sock() is responsible to clean-up the MPTCP level status, and is called when the msk transition to TCP_CLOSE. The protocol level close() does not force anymore the TCP_CLOSE status, but orphan the msk socket and all the subflows. Orphaned msk sockets are forciby closed after a timeout or when all MPTCP-level data is acked. There is a caveat about keeping the orphaned subflows around: the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via tcp_close(). To prevent accessing freed memory on later MPTCP level operations, the msk acquires a reference to each subflow socket and prevent subflow_ulp_release() from releasing the subflow context before __mptcp_destroy_sock(). The additional subflow references are released by __mptcp_done() and the async ULP release is detected checking ULP ops. If such field has been already cleared by the ULP release path, the dangling context is freed directly by __mptcp_done(). Co-developed-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NDavide Caratti <dcaratti@redhat.com> Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Track the next MPTCP sequence number used on xmit, currently always equal to write_next. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
Preparation patch to track the data pending in the msk write queue. No functional change introduced here Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
The current argument list is pretty long and quite unreadable, move many of them into a specific struct. Later patches will add more stuff to such struct. Additionally drop the 'timeo' argument, now unused. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Paolo Abeni 提交于
remove some of code duplications an allow preventing rescheduling on close. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-