1. 30 3月, 2020 2 次提交
  2. 24 3月, 2020 1 次提交
  3. 22 3月, 2020 1 次提交
  4. 20 3月, 2020 1 次提交
  5. 18 3月, 2020 1 次提交
  6. 15 3月, 2020 2 次提交
  7. 12 3月, 2020 1 次提交
    • D
      net: mptcp: don't hang before sending 'MP capable with data' · 767d3ded
      Davide Caratti 提交于
      the following packetdrill script
      
        socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
        fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
        fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
        connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
        > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 flags[flag_h] nokey>
        < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 8,mpcapable v1 flags[flag_h] key[skey=2]>
        > . 1:1(0) ack 1 win 256 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h] key[ckey,skey]>
        getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
        fcntl(3, F_SETFL, O_RDWR) = 0
        write(3, ..., 1000) = 1000
      
      doesn't transmit 1KB data packet after a successful three-way-handshake,
      using mp_capable with data as required by protocol v1, and write() hangs
      forever:
      
       PID: 973    TASK: ffff97dd399cae80  CPU: 1   COMMAND: "packetdrill"
        #0 [ffffa9b94062fb78] __schedule at ffffffff9c90a000
        #1 [ffffa9b94062fc08] schedule at ffffffff9c90a4a0
        #2 [ffffa9b94062fc18] schedule_timeout at ffffffff9c90e00d
        #3 [ffffa9b94062fc90] wait_woken at ffffffff9c120184
        #4 [ffffa9b94062fcb0] sk_stream_wait_connect at ffffffff9c75b064
        #5 [ffffa9b94062fd20] mptcp_sendmsg at ffffffff9c8e801c
        #6 [ffffa9b94062fdc0] sock_sendmsg at ffffffff9c747324
        #7 [ffffa9b94062fdd8] sock_write_iter at ffffffff9c7473c7
        #8 [ffffa9b94062fe48] new_sync_write at ffffffff9c302976
        #9 [ffffa9b94062fed0] vfs_write at ffffffff9c305685
       #10 [ffffa9b94062ff00] ksys_write at ffffffff9c305985
       #11 [ffffa9b94062ff38] do_syscall_64 at ffffffff9c004475
       #12 [ffffa9b94062ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ca0008c
           RIP: 00007f959407eaf7  RSP: 00007ffe9e95a910  RFLAGS: 00000293
           RAX: ffffffffffffffda  RBX: 0000000000000008  RCX: 00007f959407eaf7
           RDX: 00000000000003e8  RSI: 0000000001785fe0  RDI: 0000000000000008
           RBP: 0000000001785fe0   R8: 0000000000000000   R9: 0000000000000003
           R10: 0000000000000007  R11: 0000000000000293  R12: 00000000000003e8
           R13: 00007ffe9e95ae30  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      Fix it ensuring that socket state is TCP_ESTABLISHED on reception of the
      third ack.
      
      Fixes: 1954b860 ("mptcp: Check connection state before attempting send")
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767d3ded
  8. 10 3月, 2020 1 次提交
  9. 06 3月, 2020 1 次提交
  10. 04 3月, 2020 3 次提交
  11. 27 2月, 2020 8 次提交
    • P
      mptcp: add dummy icsk_sync_mss() · dc24f8b4
      Paolo Abeni 提交于
      syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
      callback, and was able to trigger a null pointer dereference:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
      RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
      RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
      RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
      R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
      R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
      FS:  0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
       netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
       smack_netlabel security/smack/smack_lsm.c:2425 [inline]
       smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
       security_inode_setsecurity+0xb2/0x140 security/security.c:1364
       __vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
       vfs_setxattr fs/xattr.c:224 [inline]
       setxattr+0x335/0x430 fs/xattr.c:451
       __do_sys_fsetxattr fs/xattr.c:506 [inline]
       __se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
       __x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
       do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440199
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
      RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
      R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
      R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      CR2: 0000000000000000
      
      Address the issue adding a dummy icsk_sync_mss callback.
      To properly sync the subflows mss and options list we need some
      additional infrastructure, which will land to net-next.
      
      Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc24f8b4
    • P
      mptcp: defer work schedule until mptcp lock is released · 14c441b5
      Paolo Abeni 提交于
      Don't schedule the work queue right away, instead defer this
      to the lock release callback.
      
      This has the advantage that it will give recv path a chance to
      complete -- this might have moved all pending packets from the
      subflow to the mptcp receive queue, which allows to avoid the
      schedule_work().
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14c441b5
    • F
      mptcp: avoid work queue scheduling if possible · 2e52213c
      Florian Westphal 提交于
      We can't lock_sock() the mptcp socket from the subflow data_ready callback,
      it would result in ABBA deadlock with the subflow socket lock.
      
      We can however grab the spinlock: if that succeeds and the mptcp socket
      is not owned at the moment, we can process the new skbs right away
      without deferring this to the work queue.
      
      This avoids the schedule_work and hence the small delay until the
      work item is processed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e52213c
    • F
      mptcp: remove mptcp_read_actor · bfae9dae
      Florian Westphal 提交于
      Only used to discard stale data from the subflow, so move
      it where needed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfae9dae
    • F
      mptcp: add rmem queue accounting · 600911ff
      Florian Westphal 提交于
      If userspace never drains the receive buffers we must stop draining
      the subflow socket(s) at some point.
      
      This adds the needed rmem accouting for this.
      If the threshold is reached, we stop draining the subflows.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600911ff
    • F
      mptcp: update mptcp ack sequence from work queue · 6771bfd9
      Florian Westphal 提交于
      If userspace is not reading data, all the mptcp-level acks contain the
      ack_seq from the last time userspace read data rather than the most
      recent in-sequence value.
      
      This causes pointless retransmissions for data that is already queued.
      
      The reason for this is that all the mptcp protocol level processing
      happens at mptcp_recv time.
      
      This adds work queue to move skbs from the subflow sockets receive
      queue on the mptcp socket receive queue (which was not used so far).
      
      This allows us to announce the correct mptcp ack sequence in a timely
      fashion, even when the application does not call recv() on the mptcp socket
      for some time.
      
      We still wake userspace tasks waiting for POLLIN immediately:
      If the mptcp level receive queue is empty (because the work queue is
      still pending) it can be filled from in-sequence subflow sockets at
      recv time without a need to wait for the worker.
      
      The skb_orphan when moving skbs from subflow to mptcp level is needed,
      because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
      being taken.
      
      A followup patch will add needed rmem accouting for the moved skbs.
      
      Other problem: In case application behaves as expected, and calls
      recv() as soon as mptcp socket becomes readable, the work queue will
      only waste cpu cycles.  This will also be addressed in followup patches.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6771bfd9
    • P
      mptcp: add work queue skeleton · 80992017
      Paolo Abeni 提交于
      Will be extended with functionality in followup patches.
      Initial user is moving skbs from subflows receive queue to
      the mptcp-level receive queue.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80992017
    • F
      mptcp: add and use mptcp_data_ready helper · 101f6f85
      Florian Westphal 提交于
      allows us to schedule the work queue to drain the ssk receive queue in
      a followup patch.
      
      This is needed to avoid sending all-to-pessimistic mptcp-level
      acknowledgements.  At this time, the ack_seq is what was last read by
      userspace instead of the highest in-sequence number queued for reading.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      101f6f85
  12. 19 2月, 2020 1 次提交
    • F
      mptcp: fix bogus socket flag values · d99bfed5
      Florian Westphal 提交于
      Dan Carpenter reports static checker warnings due to bogus BIT() usage:
      
      net/mptcp/subflow.c:571 subflow_write_space() warn: test_bit() takes a bit number
      net/mptcp/subflow.c:694 subflow_state_change() warn: test_bit() takes a bit number
      net/mptcp/protocol.c:261 ssk_check_wmem() warn: test_bit() takes a bit number
      [..]
      
      This is harmless (we use bits 1 & 2 instead of 0 and 1), but would
      break eventually when adding BIT(5) (or 6, depends on size of 'long').
      
      Just use 0 and 1, the values are only passed to test/set/clear_bit
      functions.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d99bfed5
  13. 17 2月, 2020 2 次提交
    • M
      mptcp: select CRYPTO · 357b41ca
      Matthieu Baerts 提交于
      Without this modification and if CRYPTO is not selected, we have this
      warning:
      
        WARNING: unmet direct dependencies detected for CRYPTO_LIB_SHA256
          Depends on [n]: CRYPTO [=n]
          Selected by [y]:
          - MPTCP [=y] && NET [=y] && INET [=y]
      
      MPTCP selects CRYPTO_LIB_SHA256 which seems to depend on CRYPTO. CRYPTO
      is now selected to avoid this issue.
      
      Even though the config system prints that warning, it looks like
      sha256.c is compiled and linked even without CONFIG_CRYPTO. Since MPTCP
      will end up needing CONFIG_CRYPTO anyway in future commits -- currently
      in preparation for net-next -- we propose to add it now to fix the
      warning.
      
      The dependency in the config system comes from the fact that
      CRYPTO_LIB_SHA256 is defined in "lib/crypto/Kconfig" which is sourced
      from "crypto/Kconfig" only if CRYPTO is selected.
      
      Fixes: 65492c5a (mptcp: move from sha1 (v0) to sha256 (v1))
      Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      357b41ca
    • M
      mptcp: Protect subflow socket options before connection completes · b6e4a1ae
      Mat Martineau 提交于
      Userspace should not be able to directly manipulate subflow socket
      options before a connection is established since it is not yet known if
      it will be an MPTCP subflow or a TCP fallback subflow. TCP fallback
      subflows can be more directly controlled by userspace because they are
      regular TCP connections, while MPTCP subflow sockets need to be
      configured for the specific needs of MPTCP. Use the same logic as
      sendmsg/recvmsg to ensure that socket option calls are only passed
      through to known TCP fallback subflows.
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6e4a1ae
  14. 10 2月, 2020 1 次提交
  15. 06 2月, 2020 1 次提交
    • F
      mptcp: fix use-after-free for ipv6 · b0519de8
      Florian Westphal 提交于
      Turns out that when we accept a new subflow, the newly created
      inet_sk(tcp_sk)->pinet6 points at the ipv6_pinfo structure of the
      listener socket.
      
      This wasn't caught by the selftest because it closes the accepted fd
      before the listening one.
      
      adding a close(listenfd) after accept returns is enough:
       BUG: KASAN: use-after-free in inet6_getname+0x6ba/0x790
       Read of size 1 at addr ffff88810e310866 by task mptcp_connect/2518
       Call Trace:
        inet6_getname+0x6ba/0x790
        __sys_getpeername+0x10b/0x250
        __x64_sys_getpeername+0x6f/0xb0
      
      also alter test program to exercise this.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0519de8
  16. 05 2月, 2020 1 次提交
    • F
      mptcp: fix use-after-free on tcp fallback · 2c22c06c
      Florian Westphal 提交于
      When an mptcp socket connects to a tcp peer or when a middlebox interferes
      with tcp options, mptcp needs to fall back to plain tcp.
      Problem is that mptcp is trying to be too clever in this case:
      
      It attempts to close the mptcp meta sk and transparently replace it with
      the (only) subflow tcp sk.
      
      Unfortunately, this is racy -- the socket is already exposed to userspace.
      Any parallel calls to send/recv/setsockopt etc. can cause use-after-free:
      
      BUG: KASAN: use-after-free in atomic_try_cmpxchg include/asm-generic/atomic-instrumented.h:693 [inline]
      CPU: 1 PID: 2083 Comm: syz-executor.1 Not tainted 5.5.0 #2
       atomic_try_cmpxchg include/asm-generic/atomic-instrumented.h:693 [inline]
       queued_spin_lock include/asm-generic/qspinlock.h:78 [inline]
       do_raw_spin_lock include/linux/spinlock.h:181 [inline]
       __raw_spin_lock_bh include/linux/spinlock_api_smp.h:136 [inline]
       _raw_spin_lock_bh+0x71/0xd0 kernel/locking/spinlock.c:175
       spin_lock_bh include/linux/spinlock.h:343 [inline]
       __lock_sock+0x105/0x190 net/core/sock.c:2414
       lock_sock_nested+0x10f/0x140 net/core/sock.c:2938
       lock_sock include/net/sock.h:1516 [inline]
       mptcp_setsockopt+0x2f/0x1f0 net/mptcp/protocol.c:800
       __sys_setsockopt+0x152/0x240 net/socket.c:2130
       __do_sys_setsockopt net/socket.c:2146 [inline]
       __se_sys_setsockopt net/socket.c:2143 [inline]
       __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143
       do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      While the use-after-free can be resolved, there is another problem:
      sock->ops and sock->sk assignments are not atomic, i.e. we may get calls
      into mptcp functions with sock->sk already pointing at the subflow socket,
      or calls into tcp functions with a mptcp meta sk.
      
      Remove the fallback code and call the relevant functions for the (only)
      subflow in case the mptcp socket is connected to tcp peer.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Diagnosed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Tested-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c22c06c
  17. 30 1月, 2020 7 次提交
    • G
      mptcp: Fix undefined mptcp_handle_ipv6_mapped for modular IPV6 · 31484d56
      Geert Uytterhoeven 提交于
      If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m:
      
          ERROR: "mptcp_handle_ipv6_mapped" [net/ipv6/ipv6.ko] undefined!
      
      This does not happen if CONFIG_MPTCP_IPV6=y, as CONFIG_MPTCP_IPV6
      selects CONFIG_IPV6, and thus forces CONFIG_IPV6 builtin.
      
      As exporting a symbol for an empty function would be a bit wasteful, fix
      this by providing a dummy version of mptcp_handle_ipv6_mapped() for the
      CONFIG_MPTCP_IPV6=n case.
      
      Rename mptcp_handle_ipv6_mapped() to mptcpv6_handle_mapped(), to make it
      clear this is a pure-IPV6 function, just like mptcpv6_init().
      
      Fixes: cec37a6e ("mptcp: Handle MP_CAPABLE options for outgoing connections")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31484d56
    • G
      mptcp: MPTCP_HMAC_TEST should depend on MPTCP · 389b8fb3
      Geert Uytterhoeven 提交于
      As the MPTCP HMAC test is integrated into the MPTCP code, it can be
      built only when MPTCP is enabled.  Hence when MPTCP is disabled, asking
      the user if the test code should be enabled is futile.
      
      Wrap the whole block of MPTCP-specific config options inside a check for
      MPTCP.  While at it, drop the "default n" for MPTCP_HMAC_TEST, as that
      is the default anyway.
      
      Fixes: 65492c5a ("mptcp: move from sha1 (v0) to sha256 (v1)")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      389b8fb3
    • G
      mptcp: Fix incorrect IPV6 dependency check · 8e1974a2
      Geert Uytterhoeven 提交于
      If CONFIG_MPTCP=y, CONFIG_MPTCP_IPV6=n, and CONFIG_IPV6=m:
      
          net/mptcp/protocol.o: In function `__mptcp_tcp_fallback':
          protocol.c:(.text+0x786): undefined reference to `inet6_stream_ops'
      
      Fix this by checking for CONFIG_MPTCP_IPV6 instead of CONFIG_IPV6, like
      is done in all other places in the mptcp code.
      
      Fixes: 8ab183de ("mptcp: cope with later TCP fallback")
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e1974a2
    • F
      mptcp: handle tcp fallback when using syn cookies · ae2dd716
      Florian Westphal 提交于
      We can't deal with syncookie mode yet, the syncookie rx path will create
      tcp reqsk, i.e. we get OOB access because we treat tcp reqsk as mptcp reqsk one:
      
      TCP: SYN flooding on port 20002. Sending cookies.
      BUG: KASAN: slab-out-of-bounds in subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
      Read of size 1 at addr ffff8881167bc148 by task syz-executor099/2120
       subflow_syn_recv_sock+0x451/0x4d0 net/mptcp/subflow.c:191
       tcp_get_cookie_sock+0xcf/0x520 net/ipv4/syncookies.c:209
       cookie_v6_check+0x15a5/0x1e90 net/ipv6/syncookies.c:252
       tcp_v6_cookie_check net/ipv6/tcp_ipv6.c:1123 [inline]
       [..]
      
      Bug can be reproduced via "sysctl net.ipv4.tcp_syncookies=2".
      
      Note that MPTCP should work with syncookies (4th ack would carry needed
      state), but it appears better to sort that out in -next so do tcp
      fallback for now.
      
      I removed the MPTCP ifdef for tcp_rsk "is_mptcp" member because
      if (IS_ENABLED()) is easier to read than "#ifdef IS_ENABLED()/#endif" pair.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Fixes: cec37a6e ("mptcp: Handle MP_CAPABLE options for outgoing connections")
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Tested-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae2dd716
    • F
      mptcp: avoid a lockdep splat when mcast group was joined · b2c5b614
      Florian Westphal 提交于
      syzbot triggered following lockdep splat:
      
      ffffffff82d2cd40 (rtnl_mutex){+.+.}, at: ip_mc_drop_socket+0x52/0x180
      but task is already holding lock:
      ffff8881187a2310 (sk_lock-AF_INET){+.+.}, at: mptcp_close+0x18/0x30
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      -> #1 (sk_lock-AF_INET){+.+.}:
             lock_acquire+0xee/0x230
             lock_sock_nested+0x89/0xc0
             do_ip_setsockopt.isra.0+0x335/0x22f0
             ip_setsockopt+0x35/0x60
             tcp_setsockopt+0x5d/0x90
             __sys_setsockopt+0xf3/0x190
             __x64_sys_setsockopt+0x61/0x70
             do_syscall_64+0x72/0x300
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      -> #0 (rtnl_mutex){+.+.}:
             check_prevs_add+0x2b7/0x1210
             __lock_acquire+0x10b6/0x1400
             lock_acquire+0xee/0x230
             __mutex_lock+0x120/0xc70
             ip_mc_drop_socket+0x52/0x180
             inet_release+0x36/0xe0
             __sock_release+0xfd/0x130
             __mptcp_close+0xa8/0x1f0
             inet_release+0x7f/0xe0
             __sock_release+0x69/0x130
             sock_close+0x18/0x20
             __fput+0x179/0x400
             task_work_run+0xd5/0x110
             do_exit+0x685/0x1510
             do_group_exit+0x7e/0x170
             __x64_sys_exit_group+0x28/0x30
             do_syscall_64+0x72/0x300
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      The trigger is:
        socket(AF_INET, SOCK_STREAM, 0x106 /* IPPROTO_MPTCP */) = 4
        setsockopt(4, SOL_IP, MCAST_JOIN_GROUP, {gr_interface=7, gr_group={sa_family=AF_INET, sin_port=htons(20003), sin_addr=inet_addr("224.0.0.2")}}, 136) = 0
        exit(0)
      
      Which results in a call to rtnl_lock while we are holding
      the parent mptcp socket lock via
      mptcp_close -> lock_sock(msk) -> inet_release -> ip_mc_drop_socket -> rtnl_lock().
      
      >From lockdep point of view we thus have both
      'rtnl_lock; lock_sock' and 'lock_sock; rtnl_lock'.
      
      Fix this by stealing the msk conn_list and doing the subflow close
      without holding the msk lock.
      
      Fixes: cec37a6e ("mptcp: Handle MP_CAPABLE options for outgoing connections")
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2c5b614
    • F
      mptcp: fix panic on user pointer access · 50e741bb
      Florian Westphal 提交于
      Its not possible to call the kernel_(s|g)etsockopt functions here,
      the address points to user memory:
      
      General protection fault in user access. Non-canonical address?
      WARNING: CPU: 1 PID: 5352 at arch/x86/mm/extable.c:77 ex_handler_uaccess+0xba/0xe0 arch/x86/mm/extable.c:77
      Kernel panic - not syncing: panic_on_warn set ...
      [..]
      Call Trace:
       fixup_exception+0x9d/0xcd arch/x86/mm/extable.c:178
       general_protection+0x2d/0x40 arch/x86/entry/entry_64.S:1202
       do_ip_getsockopt+0x1f6/0x1860 net/ipv4/ip_sockglue.c:1323
       ip_getsockopt+0x87/0x1c0 net/ipv4/ip_sockglue.c:1561
       tcp_getsockopt net/ipv4/tcp.c:3691 [inline]
       tcp_getsockopt+0x8c/0xd0 net/ipv4/tcp.c:3685
       kernel_getsockopt+0x121/0x1f0 net/socket.c:3736
       mptcp_getsockopt+0x69/0x90 net/mptcp/protocol.c:830
       __sys_getsockopt+0x13a/0x220 net/socket.c:2175
      
      We can call tcp_get/setsockopt functions instead.  Doing so fixes
      crashing, but still leaves rtnl related lockdep splat:
      
           WARNING: possible circular locking dependency detected
           5.5.0-rc6 #2 Not tainted
           ------------------------------------------------------
           syz-executor.0/16334 is trying to acquire lock:
           ffffffff84f7a080 (rtnl_mutex){+.+.}, at: do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644
           but task is already holding lock:
           ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: lock_sock include/net/sock.h:1516 [inline]
           ffff888116503b90 (sk_lock-AF_INET){+.+.}, at: mptcp_setsockopt+0x28/0x90 net/mptcp/protocol.c:1284
      
           which lock already depends on the new lock.
           the existing dependency chain (in reverse order) is:
      
           -> #1 (sk_lock-AF_INET){+.+.}:
                  lock_sock_nested+0xca/0x120 net/core/sock.c:2944
                  lock_sock include/net/sock.h:1516 [inline]
                  do_ip_setsockopt.isra.0+0x281/0x3820 net/ipv4/ip_sockglue.c:645
                  ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248
                  udp_setsockopt+0x5d/0xa0 net/ipv4/udp.c:2639
                  __sys_setsockopt+0x152/0x240 net/socket.c:2130
                  __do_sys_setsockopt net/socket.c:2146 [inline]
                  __se_sys_setsockopt net/socket.c:2143 [inline]
                  __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143
                  do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294
                  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
           -> #0 (rtnl_mutex){+.+.}:
                  check_prev_add kernel/locking/lockdep.c:2475 [inline]
                  check_prevs_add kernel/locking/lockdep.c:2580 [inline]
                  validate_chain kernel/locking/lockdep.c:2970 [inline]
                  __lock_acquire+0x1fb2/0x4680 kernel/locking/lockdep.c:3954
                  lock_acquire+0x127/0x330 kernel/locking/lockdep.c:4484
                  __mutex_lock_common kernel/locking/mutex.c:956 [inline]
                  __mutex_lock+0x158/0x1340 kernel/locking/mutex.c:1103
                  do_ip_setsockopt.isra.0+0x277/0x3820 net/ipv4/ip_sockglue.c:644
                  ip_setsockopt+0x44/0xf0 net/ipv4/ip_sockglue.c:1248
                  tcp_setsockopt net/ipv4/tcp.c:3159 [inline]
                  tcp_setsockopt+0x8c/0xd0 net/ipv4/tcp.c:3153
                  kernel_setsockopt+0x121/0x1f0 net/socket.c:3767
                  mptcp_setsockopt+0x69/0x90 net/mptcp/protocol.c:1288
                  __sys_setsockopt+0x152/0x240 net/socket.c:2130
                  __do_sys_setsockopt net/socket.c:2146 [inline]
                  __se_sys_setsockopt net/socket.c:2143 [inline]
                  __x64_sys_setsockopt+0xba/0x150 net/socket.c:2143
                  do_syscall_64+0xbd/0x5b0 arch/x86/entry/common.c:294
                  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
           other info that might help us debug this:
      
            Possible unsafe locking scenario:
      
                  CPU0                    CPU1
                  ----                    ----
             lock(sk_lock-AF_INET);
                                          lock(rtnl_mutex);
                                          lock(sk_lock-AF_INET);
             lock(rtnl_mutex);
      
      The lockdep complaint is because we hold mptcp socket lock when calling
      the sk_prot get/setsockopt handler, and those might need to acquire the
      rtnl mutex.  Normally, order is:
      
      rtnl_lock(sk) -> lock_sock
      
      Whereas for mptcp the order is
      
      lock_sock(mptcp_sk) rtnl_lock -> lock_sock(subflow_sk)
      
      We can avoid this by releasing the mptcp socket lock early, but, as Paolo
      points out, we need to get/put the subflow socket refcount before doing so
      to avoid race with concurrent close().
      
      Fixes: 717e79c8 ("mptcp: Add setsockopt()/getsockopt() socket operations")
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50e741bb
    • F
      mptcp: defer freeing of cached ext until last moment · c9fd9c5f
      Florian Westphal 提交于
      access to msk->cached_ext is only legal if the msk is locked or all
      concurrent accesses are impossible.
      
      Furthermore, once we start to tear down, we must make sure nothing else
      can step in and allocate a new cached ext.
      
      So place this code in the destroy callback where it belongs.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9fd9c5f
  18. 29 1月, 2020 1 次提交
  19. 25 1月, 2020 2 次提交
    • M
      mptcp: Fix code formatting · edc7e489
      Mat Martineau 提交于
      checkpatch.pl had a few complaints in the last set of MPTCP patches:
      
      ERROR: code indent should use tabs where possible
      +^I         subflow, sk->sk_family, icsk->icsk_af_ops, target, mapped);$
      
      CHECK: Comparison to NULL could be written "!new_ctx"
      +	if (new_ctx == NULL) {
      
      ERROR: "foo * bar" should be "foo *bar"
      +static const struct proto_ops * tcp_proto_ops(struct sock *sk)
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      edc7e489
    • F
      mptcp: do not inherit inet proto ops · e42f1ac6
      Florian Westphal 提交于
      We need to initialise the struct ourselves, else we expose tcp-specific
      callbacks such as tcp_splice_read which will then trigger splat because
      the socket is an mptcp one:
      
      BUG: KASAN: slab-out-of-bounds in tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
      Write of size 8 at addr ffff888116aa21d0 by task syz-executor.0/5478
      
      CPU: 1 PID: 5478 Comm: syz-executor.0 Not tainted 5.5.0-rc6 #3
      Call Trace:
       tcp_mstamp_refresh+0x80/0xa0 net/ipv4/tcp_output.c:57
       tcp_rcv_space_adjust+0x72/0x7f0 net/ipv4/tcp_input.c:612
       tcp_read_sock+0x622/0x990 net/ipv4/tcp.c:1674
       tcp_splice_read+0x20b/0xb40 net/ipv4/tcp.c:791
       do_splice+0x1259/0x1560 fs/splice.c:1205
      
      To prevent build error with ipv6, add the recv/sendmsg function
      declaration to ipv6.h.  The functions are already accessible "thanks"
      to retpoline related work, but they are currently only made visible
      by socket.c specific INDIRECT_CALLABLE macros.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e42f1ac6
  20. 24 1月, 2020 2 次提交
    • P
      mptcp: cope with later TCP fallback · 8ab183de
      Paolo Abeni 提交于
      With MPTCP v1, passive connections can fallback to TCP after the
      subflow becomes established:
      
      syn + MP_CAPABLE ->
                     <- syn, ack + MP_CAPABLE
      
      ack, seq = 3    ->
              // OoO packet is accepted because in-sequence
              // passive socket is created, is in ESTABLISHED
      	// status and tentatively as MP_CAPABLE
      
      ack, seq = 2     ->
              // no MP_CAPABLE opt, subflow should fallback to TCP
      
      We can't use the 'subflow' socket fallback, as we don't have
      it available for passive connection.
      
      Instead, when the fallback is detected, replace the mptcp
      socket with the underlying TCP subflow. Beyond covering
      the above scenario, it makes a TCP fallback socket as efficient
      as plain TCP ones.
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ab183de
    • C
      mptcp: process MP_CAPABLE data option · d22f4988
      Christoph Paasch 提交于
      This patch implements the handling of MP_CAPABLE + data option, as per
      RFC 6824 bis / RFC 8684: MPTCP v1.
      
      On the server side we can receive the remote key after that the connection
      is established. We need to explicitly track the 'missing remote key'
      status and avoid emitting a mptcp ack until we get such info.
      
      When a late/retransmitted/OoO pkt carrying MP_CAPABLE[+data] option
      is received, we have to propagate the mptcp seq number info to
      the msk socket. To avoid ABBA locking issue, explicitly check for
      that in recvmsg(), where we own msk and subflow sock locks.
      
      The above also means that an established mp_capable subflow - still
      waiting for the remote key - can be 'downgraded' to plain TCP.
      
      Such change could potentially block a reader waiting for new data
      forever - as they hook to msk, while later wake-up after the downgrade
      will be on subflow only.
      
      The above issue is not handled here, we likely have to get rid of
      msk->fallback to handle that cleanly.
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d22f4988