提交 · 917944da3bfc7cb5ac3af26725af3371d3a12db0 · openeuler / Kernel

30 9月, 2020 1 次提交

mptcp: Consistently use READ_ONCE/WRITE_ONCE with msk->ack_seq · 917944da

由 Mat Martineau 提交于 9月 29, 2020

The msk->ack_seq value is sometimes read without the msk lock held, so
make proper use of READ_ONCE and WRITE_ONCE.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

917944da

27 8月, 2020 1 次提交

mptcp: free acked data before waiting for more memory · 1cec170d

由 Florian Westphal 提交于 8月 26, 2020

After subflow lock is dropped, more wmem might have been made available.

This fixes a deadlock in mptcp_connect.sh 'mmap' mode: wmem is exhausted.
But as the mptcp socket holds on to already-acked data (for retransmit)
no wakeup will occur.

Using 'goto restart' calls mptcp_clean_una(sk) which will free pages
that have been acked completely in the mean time.

Fixes: fb529e62 ("mptcp: break and restart in case mptcp sndbuf is full")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1cec170d

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

17 8月, 2020 1 次提交

mptcp: sendmsg: reset iter on error redux · b3b2854d

由 Florian Westphal 提交于 8月 16, 2020

This fix wasn't correct: When this function is invoked from the
retransmission worker, the iterator contains garbage and resetting
it causes a crash.

As the work queue should not be performance critical also zero the
msghdr struct.

Fixes: 35759383 "(mptcp: sendmsg: reset iter on error)"
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3b2854d

15 8月, 2020 1 次提交

mptcp: sendmsg: reset iter on error · 35759383

由 Florian Westphal 提交于 8月 14, 2020

Once we've copied data from the iterator we need to revert in case we
end up not sending any data.

This bug doesn't trigger with normal 'poll' based tests, because
we only feed a small chunk of data to kernel after poll indicated
POLLOUT.  With blocking IO and large writes this triggers. Receiver
ends up with less data than it should get.

Fixes: 72511aab ("mptcp: avoid blocking in tcp_sendpages")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35759383

04 8月, 2020 2 次提交

mptcp: fix bogus sendmsg() return code under pressure · 8555c6bf

由 Paolo Abeni 提交于 8月 03, 2020

In case of memory pressure, mptcp_sendmsg() may call
sk_stream_wait_memory() after succesfully xmitting some
bytes. If the latter fails we currently return to the
user-space the error code, ignoring the succeful xmit.

Address the issue always checking for the xmitted bytes
before mptcp_sendmsg() completes.

Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8555c6bf

mptcp: use mptcp_for_each_subflow in mptcp_stream_accept · 190f8b06

由 Geliang Tang 提交于 8月 03, 2020

Use mptcp_for_each_subflow in mptcp_stream_accept instead of
open-coding.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

190f8b06

29 7月, 2020 9 次提交

mptcp: Safely store sequence number when sending data · 721e9089

由 Mat Martineau 提交于 7月 28, 2020

The MPTCP socket's write_seq member can be read without the msk lock
held, so use WRITE_ONCE() to store it.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

721e9089

mptcp: Safely read sequence number when lock isn't held · c7529392

由 Mat Martineau 提交于 7月 28, 2020

The MPTCP socket's write_seq member should be read with READ_ONCE() when
the msk lock is not held.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7529392

mptcp: Use full MPTCP-level disconnect state machine · 43b54c6e

由 Mat Martineau 提交于 7月 28, 2020

RFC 8684 appendix D describes the connection state machine for
MPTCP. This patch implements the DATA_FIN / DATA_ACK exchanges and
MPTCP-level socket state changes described in that appendix, rather than
simply sending DATA_FIN along with TCP FIN when disconnecting subflows.

DATA_FIN is now sent and acknowledged before shutting down the
subflows. Received DATA_FIN information (if not part of a data packet)
is written to the MPTCP socket when the incoming DSS option is parsed by
the subflow, and the MPTCP worker is scheduled to process the
flag. DATA_FIN received as part of a full DSS mapping will be handled
when the mapping is processed.

The DATA_FIN is acknowledged by the worker if the reader is caught
up. If there is still data to be moved to the MPTCP-level queue, ack_seq
will be incremented to account for the DATA_FIN when it reaches the end
of the stream and a DATA_ACK will be sent to the peer.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43b54c6e

mptcp: Add helper to process acks of DATA_FIN · 16a9a9da

由 Mat Martineau 提交于 7月 28, 2020

After DATA_FIN has been sent, the peer will acknowledge it. An ack of
the relevant MPTCP-level sequence number will update the MPTCP
connection state appropriately.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16a9a9da

mptcp: Add mptcp_close_state() helper · 6920b851

由 Mat Martineau 提交于 7月 28, 2020

This will be used to transition to the appropriate state on close and
determine if a DATA_FIN needs to be sent for that state transition.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6920b851

mptcp: Track received DATA_FIN sequence number and add related helpers · 3721b9b6

由 Mat Martineau 提交于 7月 28, 2020

Incoming DATA_FIN headers need to propagate the presence of the DATA_FIN
bit and the associated sequence number to the MPTCP layer, even when
arriving on a bare ACK that does not get added to the receive queue. Add
structure members to store the DATA_FIN information and helpers to set
and check those values.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3721b9b6

mptcp: Use MPTCP-level flag for sending DATA_FIN · 7279da61

由 Mat Martineau 提交于 7月 28, 2020

Since DATA_FIN information is the same for every subflow, store it only
in the mptcp_sock.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7279da61

mptcp: Remove outdated and incorrect comment · 242e63f6

由 Mat Martineau 提交于 7月 28, 2020

mptcp_close() acquires the msk lock, so it clearly should not be held
before the function is called.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

242e63f6

mptcp: Return EPIPE if sending is shut down during a sendmsg · 57baaf28

由 Mat Martineau 提交于 7月 28, 2020

A MPTCP socket where sending has been shut down should not attempt to
send additional data, since DATA_FIN has already been sent.
Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57baaf28

28 7月, 2020 1 次提交

mptcp: fix joined subflows with unblocking sk · 367fe04e

由 Matthieu Baerts 提交于 7月 27, 2020

Unblocking sockets used for outgoing connections were not containing
inet info about the initial connection due to a typo there: the value of
"err" variable is negative in the kernelspace.

This fixes the creation of additional subflows where the remote port has
to be reused if the other host didn't announce another one. This also
fixes inet_diag showing blank info about MPTCP sockets from unblocking
sockets doing a connect().

Fixes: 41be81a8 ("mptcp: fix unblocking connect()")
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

367fe04e

25 7月, 2020 2 次提交

net: pass a sockptr_t into ->setsockopt · a7b75c5a

由 Christoph Hellwig 提交于 7月 23, 2020

Rework the remaining setsockopt code to pass a sockptr_t instead of a
plain user pointer.  This removes the last remaining set_fs(KERNEL_DS)
outside of architecture specific code.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154]
Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7b75c5a

net: switch sock_set_timeout to sockptr_t · c8c1bbb6

由 Christoph Hellwig 提交于 7月 23, 2020

Pass a sockptr_t to prepare for set_fs-less handling of the kernel
pointer from bpf-cgroup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c8c1bbb6

24 7月, 2020 4 次提交

mptcp: explicitly track the fully established status · b93df08c

由 Paolo Abeni 提交于 7月 23, 2020

Currently accepted msk sockets become established only after
accept() returns the new sk to user-space.

As MP_JOIN request are refused as per RFC spec on non fully
established socket, the above causes mp_join self-tests
instabilities.

This change lets the msk entering the established status
as soon as it receives the 3rd ack and propagates the first
subflow fully established status on the msk socket.

Finally we can change the subflow acceptance condition to
take in account both the sock state and the msk fully
established flag.
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b93df08c

mptcp: mark as fallback even early ones · 0235d075

由 Paolo Abeni 提交于 7月 23, 2020

In the unlikely event of a failure at connect time,
we currently clear the request_mptcp flag - so that
the MPC handshake is not started at all, but the msk
is not explicitly marked as fallback.

This would lead to later insertion of wrong DSS options
in the xmitted packets, in violation of RFC specs and
possibly fooling the peer.

Fixes: e1ff9e82 ("net: mptcp: improve fallback to TCP")
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0235d075

mptcp: avoid data corruption on reinsert · 53eb4c38

由 Paolo Abeni 提交于 7月 23, 2020

When updating a partially acked data fragment, we
actually corrupt it. This is irrelevant till we send
data on a single subflow, as retransmitted data, if
any are discarded by the peer as duplicate, but it
will cause data corruption as soon as we will start
creating non backup subflows.
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53eb4c38

subflow: always init 'rel_write_seq' · b0977bb2

由 Paolo Abeni 提交于 7月 23, 2020

Currently we do not init the subflow write sequence for
MP_JOIN subflows. This will cause bad mapping being
generated as soon as we will use non backup subflow.
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Tested-by: NChristoph Paasch <cpaasch@apple.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b0977bb2

20 7月, 2020 1 次提交

net: remove compat_sock_common_{get,set}sockopt · 8c918ffb

由 Christoph Hellwig 提交于 7月 17, 2020

Add the compat handling to sock_common_{get,set}sockopt instead,
keyed of in_compat_syscall().  This allow to remove the now unused
->compat_{get,set}sockopt methods from struct proto_ops.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Acked-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c918ffb

08 7月, 2020 1 次提交

mptcp: use mptcp worker for path management · b416268b

由 Florian Westphal 提交于 7月 07, 2020

We can re-use the existing work queue to handle path management
instead of a dedicated work queue. Just move pm_worker to protocol.c,
call it from the mptcp worker and get rid of the msk lock (already held).
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b416268b

05 7月, 2020 3 次提交

mptcp: support IPV6_V6ONLY setsockopt · c9b95a13

由 Florian Westphal 提交于 7月 05, 2020

Without this, Opensshd fails to open an ipv6 socket listening
socket:
  error: setsockopt IPV6_V6ONLY: Operation not supported
  error: Bind to port 22 on :: failed: Address already in use.

Opensshd opens an ipv4 and and ipv6 listening socket, but because
IPV6_V6ONLY setsockopt fails, the port number is already in use.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9b95a13

mptcp: add REUSEADDR/REUSEPORT support · fd1452d8

由 Florian Westphal 提交于 7月 05, 2020

This will e.g. make 'sshd restart' work when MPTCP is used, as we will
now set this option on the listener socket instead of only the mptcp
socket (where it has no effect).

We still need to copy the setting to the master socket so that a
subsequent getsockopt() returns the expected value.
Reported-by: NChristoph Paasch <cpaasch@apple.com>
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd1452d8

net: use mptcp setsockopt function for SOL_SOCKET on mptcp sockets · 83f0c10b

由 Florian Westphal 提交于 7月 05, 2020

setsockopt(mptcp_fd, SOL_SOCKET, ...)...  appears to work (returns 0),
but it has no effect -- this is because the MPTCP layer never has a
chance to copy the settings to the subflow socket.

Skip the generic handling for the mptcp case and instead call the
mptcp specific handler instead for SOL_SOCKET too.

Next patch adds more specific handling for SOL_SOCKET to mptcp.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83f0c10b

02 7月, 2020 1 次提交

mptcp: add receive buffer auto-tuning · a6b118fe

由 Florian Westphal 提交于 6月 30, 2020

When mptcp is used, userspace doesn't read from the tcp (subflow)
socket but from the parent (mptcp) socket receive queue.

skbs are moved from the subflow socket to the mptcp rx queue either from
'data_ready' callback (if mptcp socket can be locked), a work queue, or
the socket receive function.

This means tcp_rcv_space_adjust() is never called and thus no receive
buffer size auto-tuning is done.

An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
function that moves skbs from subflow to mptcp socket.
While this enabled autotuning, it also meant tuning was done even if
userspace was reading the mptcp socket very slowly.

This adds mptcp_rcv_space_adjust() and calls it after userspace has
read data from the mptcp socket rx queue.

Its very similar to tcp_rcv_space_adjust, with two differences:

1. The rtt estimate is the largest one observed on a subflow
2. The rcvbuf size and window clamp of all subflows is adjusted
   to the mptcp-level rcvbuf.

Otherwise, we get spurious drops at tcp (subflow) socket level if
the skbs are not moved to the mptcp socket fast enough.

Before:
time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
[..]
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40823ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration 23119ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5421ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41446ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration 23427ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5426ms) [ OK ]
Time: 1396 seconds

After:
ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5427ms) [ OK ]
ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5422ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5415ms) [ OK ]
ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5422ms) [ OK ]
ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5423ms) [ OK ]
Time: 296 seconds
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6b118fe

30 6月, 2020 5 次提交

mptcp: close poll() races · 8a05661b

由 Paolo Abeni 提交于 6月 29, 2020

mptcp_poll always return POLLOUT for unblocking
connect(), ensure that the socket is a suitable
state.
The MPTCP_DATA_READY bit is never cleared on accept:
ensure we don't leave mptcp_accept() with an empty
accept queue and such bit set.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8a05661b

mptcp: __mptcp_tcp_fallback() returns a struct sock · 76660afb

由 Paolo Abeni 提交于 6月 29, 2020

Currently __mptcp_tcp_fallback() always return NULL
on incoming connections, because MPTCP does not create
the additional socket for the first subflow.
Since the previous commit no __mptcp_tcp_fallback()
caller needs a struct socket, so let __mptcp_tcp_fallback()
return the first subflow sock and cope correctly even with
incoming connections.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76660afb

mptcp: create first subflow at msk creation time · fa68018d

由 Paolo Abeni 提交于 6月 29, 2020

This cleans the code a bit and makes the behavior more consistent.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa68018d

mptcp: check for plain TCP sock at accept time · d2f77c53

由 Paolo Abeni 提交于 6月 29, 2020

This cleanup the code a bit and avoid corrupted states
on weird syscall sequence (accept(), connect()).
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2f77c53

net: mptcp: improve fallback to TCP · e1ff9e82

由 Davide Caratti 提交于 6月 29, 2020

Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
to regular TCP. When fallback is triggered, skip addition of the MPTCP
option on send.

Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e1ff9e82

27 6月, 2020 2 次提交

mptcp: refactor token container · 2c5ebd00

由 Paolo Abeni 提交于 6月 26, 2020

Replace the radix tree with a hash table allocated
at boot time. The radix tree has some shortcoming:
a single lock is contented by all the mptcp operation,
the lookup currently use such lock, and traversing
all the items would require a lock, too.

With hash table instead we trade a little memory to
address all the above - a per bucket lock is used.

To hash the MPTCP sockets, we re-use the msk' sk_node
entry: the MPTCP sockets are never hashed by the stack.
Replace the existing hash proto callbacks with a dummy
implementation, annotating the above constraint.

Additionally refactor the token creation to code to:

- limit the number of consecutive attempts to a fixed
maximum. Hitting a hash bucket with a long chain is
considered a failed attempt

- accept() no longer can fail to token management.

- if token creation fails at connect() time, we do
fallback to TCP (before the connection was closed)

v1 -> v2:
 - fix "no newline at end of file" - Jakub
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c5ebd00

mptcp: add __init annotation on setup functions · d39dceca

由 Paolo Abeni 提交于 6月 26, 2020

Add the missing annotation in some setup-only
functions.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d39dceca

11 6月, 2020 1 次提交

mptcp: fix races between shutdown and recvmsg · 5969856a

由 Paolo Abeni 提交于 6月 10, 2020

The msk sk_shutdown flag is set by a workqueue, possibly
introducing some delay in user-space notification. If the last
subflow carries some data with the fin packet, the user space
can wake-up before RCV_SHUTDOWN is set. If it executes unblocking
recvmsg(), it may return with an error instead of eof.

Address the issue explicitly checking for eof in recvmsg(), when
no data is found.

Fixes: 59832e24 ("mptcp: subflow: check parent mptcp socket on subflow state change")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5969856a

31 5月, 2020 3 次提交

mptcp: remove msk from the token container at destruction time. · c5c79763

由 Paolo Abeni 提交于 5月 29, 2020

Currently we remote the msk from the token container only
via mptcp_close(). The MPTCP master socket can be destroyed
also via other paths (e.g. if not yet accepted, when shutting
down the listener socket). When we hit the latter scenario,
dangling msk references are left into the token container,
leading to memory corruption and/or UaF.

This change addresses the issue by moving the token removal
into the msk destructor.

Fixes: 79c0949e ("mptcp: Add key generation and token tree")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5c79763

mptcp: fix race between MP_JOIN and close · 10f6d46c

由 Paolo Abeni 提交于 5月 29, 2020

If a MP_JOIN subflow completes the 3whs while another
CPU is closing the master msk, we can hit the
following race:

CPU1                                    CPU2

close()
 mptcp_close
                                        subflow_syn_recv_sock
                                         mptcp_token_get_sock
                                         mptcp_finish_join
                                          inet_sk_state_load
  mptcp_token_destroy
  inet_sk_state_store(TCP_CLOSE)
  __mptcp_flush_join_list()
                                          mptcp_sock_graft
                                          list_add_tail
  sk_common_release
   sock_orphan()
 <socket free>

The MP_JOIN socket will be leaked. Additionally we can hit
UaF for the msk 'struct socket' referenced via the 'conn'
field.

This change try to address the issue introducing some
synchronization between the MP_JOIN 3whs and mptcp_close
via the join_list spinlock. If we detect the msk is closing
the MP_JOIN socket is closed, too.

Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10f6d46c

mptcp: fix unblocking connect() · 41be81a8

由 Paolo Abeni 提交于 5月 29, 2020

Currently unblocking connect() on MPTCP sockets fails frequently.
If mptcp_stream_connect() is invoked to complete a previously
attempted unblocking connection, it will still try to create
the first subflow via __mptcp_socket_create(). If the 3whs is
completed and the 'can_ack' flag is already set, the latter
will fail with -EINVAL.

This change addresses the issue checking for pending connect and
delegating the completion to the first subflow. Additionally
do msk addresses and sk_state changes only when needed.

Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41be81a8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功