1. 31 5月, 2020 2 次提交
    • P
      mptcp: fix race between MP_JOIN and close · 10f6d46c
      Paolo Abeni 提交于
      If a MP_JOIN subflow completes the 3whs while another
      CPU is closing the master msk, we can hit the
      following race:
      
      CPU1                                    CPU2
      
      close()
       mptcp_close
                                              subflow_syn_recv_sock
                                               mptcp_token_get_sock
                                               mptcp_finish_join
                                                inet_sk_state_load
        mptcp_token_destroy
        inet_sk_state_store(TCP_CLOSE)
        __mptcp_flush_join_list()
                                                mptcp_sock_graft
                                                list_add_tail
        sk_common_release
         sock_orphan()
       <socket free>
      
      The MP_JOIN socket will be leaked. Additionally we can hit
      UaF for the msk 'struct socket' referenced via the 'conn'
      field.
      
      This change try to address the issue introducing some
      synchronization between the MP_JOIN 3whs and mptcp_close
      via the join_list spinlock. If we detect the msk is closing
      the MP_JOIN socket is closed, too.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10f6d46c
    • P
      mptcp: fix unblocking connect() · 41be81a8
      Paolo Abeni 提交于
      Currently unblocking connect() on MPTCP sockets fails frequently.
      If mptcp_stream_connect() is invoked to complete a previously
      attempted unblocking connection, it will still try to create
      the first subflow via __mptcp_socket_create(). If the 3whs is
      completed and the 'can_ack' flag is already set, the latter
      will fail with -EINVAL.
      
      This change addresses the issue checking for pending connect and
      delegating the completion to the first subflow. Additionally
      do msk addresses and sk_state changes only when needed.
      
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41be81a8
  2. 27 5月, 2020 1 次提交
    • P
      mptcp: avoid NULL-ptr derefence on fallback · 0a82e230
      Paolo Abeni 提交于
      In the MPTCP receive path we must cope with TCP fallback
      on blocking recvmsg(). Currently in such code path we detect
      the fallback condition, but we don't fetch the struct socket
      required for fallback.
      
      The above allowed syzkaller to trigger a NULL pointer
      dereference:
      
      general protection fault, probably for non-canonical address 0xdffffc0000000004: 0000 [#1] PREEMPT SMP KASAN
      KASAN: null-ptr-deref in range [0x0000000000000020-0x0000000000000027]
      CPU: 1 PID: 7226 Comm: syz-executor523 Not tainted 5.7.0-rc6-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:sock_recvmsg_nosec net/socket.c:886 [inline]
      RIP: 0010:sock_recvmsg+0x92/0x110 net/socket.c:904
      Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 44 89 6c 24 04 e8 53 18 1d fb 4d 8d 6f 20 4c 89 e8 48 c1 e8 03 48 b9 00 00 00 00 00 fc ff df <80> 3c 08 00 74 08 4c 89 ef e8 20 12 5b fb bd a0 00 00 00 49 03 6d
      RSP: 0018:ffffc90001077b98 EFLAGS: 00010202
      RAX: 0000000000000004 RBX: ffffc90001077dc0 RCX: dffffc0000000000
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      RBP: 0000000000000000 R08: ffffffff86565e59 R09: ffffed10115afeaa
      R10: ffffed10115afeaa R11: 0000000000000000 R12: 1ffff9200020efbc
      R13: 0000000000000020 R14: ffffc90001077de0 R15: 0000000000000000
      FS:  00007fc6a3abe700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004d0050 CR3: 00000000969f0000 CR4: 00000000001406e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       mptcp_recvmsg+0x18d5/0x19b0 net/mptcp/protocol.c:891
       inet_recvmsg+0xf6/0x1d0 net/ipv4/af_inet.c:838
       sock_recvmsg_nosec net/socket.c:886 [inline]
       sock_recvmsg net/socket.c:904 [inline]
       __sys_recvfrom+0x2f3/0x470 net/socket.c:2057
       __do_sys_recvfrom net/socket.c:2075 [inline]
       __se_sys_recvfrom net/socket.c:2071 [inline]
       __x64_sys_recvfrom+0xda/0xf0 net/socket.c:2071
       do_syscall_64+0xf3/0x1b0 arch/x86/entry/common.c:295
       entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Address the issue initializing the struct socket reference
      before entering the fallback code.
      
      Reported-and-tested-by: syzbot+c6bfc3db991edc918432@syzkaller.appspotmail.com
      Suggested-by: NOndrej Mosnacek <omosnace@redhat.com>
      Fixes: 8ab183de ("mptcp: cope with later TCP fallback")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a82e230
  3. 13 5月, 2020 1 次提交
  4. 01 5月, 2020 1 次提交
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
  5. 30 4月, 2020 1 次提交
    • F
      mptcp: replace mptcp_disconnect with a stub · 42c556fe
      Florian Westphal 提交于
      Paolo points out that mptcp_disconnect is bogus:
      "lock_sock(sk);
      looks suspicious (lock should be already held by the caller)
      And call to: tcp_disconnect(sk, flags); too, sk is not a tcp
      socket".
      
      ->disconnect() gets called from e.g. inet_stream_connect when
      one tries to disassociate a connected socket again (to re-connect
      without closing the socket first).
      MPTCP however uses mptcp_stream_connect, not inet_stream_connect,
      for the mptcp-socket connect call.
      
      inet_stream_connect only gets called indirectly, for the tcp socket,
      so any ->disconnect() calls end up calling tcp_disconnect for that
      tcp subflow sk.
      
      This also explains why syzkaller has not yet reported a problem
      here.  So for now replace this with a stub that doesn't do anything.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/14Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42c556fe
  6. 21 4月, 2020 2 次提交
    • P
      mptcp: drop req socket remote_key* fields · fca5c82c
      Paolo Abeni 提交于
      We don't need them, as we can use the current ingress opt
      data instead. Setting them in syn_recv_sock() may causes
      inconsistent mptcp socket status, as per previous commit.
      
      Fixes: cc7972ea ("mptcp: parse and emit MP_CAPABLE option according to v1 spec")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fca5c82c
    • F
      mptcp: handle mptcp listener destruction via rcu · 5e20087d
      Florian Westphal 提交于
      Following splat can occur during self test:
      
       BUG: KASAN: use-after-free in subflow_data_ready+0x156/0x160
       Read of size 8 at addr ffff888100c35c28 by task mptcp_connect/4808
      
        subflow_data_ready+0x156/0x160
        tcp_child_process+0x6a3/0xb30
        tcp_v4_rcv+0x2231/0x3730
        ip_protocol_deliver_rcu+0x5c/0x860
        ip_local_deliver_finish+0x220/0x360
        ip_local_deliver+0x1c8/0x4e0
        ip_rcv_finish+0x1da/0x2f0
        ip_rcv+0xd0/0x3c0
        __netif_receive_skb_one_core+0xf5/0x160
        __netif_receive_skb+0x27/0x1c0
        process_backlog+0x21e/0x780
        net_rx_action+0x35f/0xe90
        do_softirq+0x4c/0x50
        [..]
      
      This occurs when accessing subflow_ctx->conn.
      
      Problem is that tcp_child_process() calls listen sockets'
      sk_data_ready() notification, but it doesn't hold the listener
      lock.  Another cpu calling close() on the listener will then cause
      transition of refcount to 0.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e20087d
  7. 19 4月, 2020 2 次提交
    • F
      mptcp: fix 'Attempt to release TCP socket in state' warnings · 9f5ca6a5
      Florian Westphal 提交于
      We need to set sk_state to CLOSED, else we will get following:
      
      IPv4: Attempt to release TCP socket in state 3 00000000b95f109e
      IPv4: Attempt to release TCP socket in state 10 00000000b95f109e
      
      First one is from inet_sock_destruct(), second one from
      mptcp_sk_clone failure handling.  Setting sk_state to CLOSED isn't
      enough, we also need to orphan sk so it has DEAD flag set.
      Otherwise, a very similar warning is printed from inet_sock_destruct().
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f5ca6a5
    • F
      mptcp: fix splat when incoming connection is never accepted before exit/close · df1036da
      Florian Westphal 提交于
      Following snippet (replicated from syzkaller reproducer) generates
      warning: "IPv4: Attempt to release TCP socket in state 1".
      
      int main(void) {
       struct sockaddr_in sin1 = { .sin_family = 2, .sin_port = 0x4e20,
                                   .sin_addr.s_addr = 0x010000e0, };
       struct sockaddr_in sin2 = { .sin_family = 2,
      	                     .sin_addr.s_addr = 0x0100007f, };
       struct sockaddr_in sin3 = { .sin_family = 2, .sin_port = 0x4e20,
      	                     .sin_addr.s_addr = 0x0100007f, };
       int r0 = socket(0x2, 0x1, 0x106);
       int r1 = socket(0x2, 0x1, 0x106);
      
       bind(r1, (void *)&sin1, sizeof(sin1));
       connect(r1, (void *)&sin2, sizeof(sin2));
       listen(r1, 3);
       return connect(r0, (void *)&sin3, 0x4d);
      }
      
      Reason is that the newly generated mptcp socket is closed via the ulp
      release of the tcp listener socket when its accept backlog gets purged.
      
      To fix this, delay setting the ESTABLISHED state until after userspace
      calls accept and via mptcp specific destructor.
      
      Fixes: 58b09919 ("mptcp: create msk early")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/9Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df1036da
  8. 13 4月, 2020 1 次提交
    • F
      mptcp: fix double-unlock in mptcp_poll · e154659b
      Florian Westphal 提交于
      mptcp_connect/28740 is trying to release lock (sk_lock-AF_INET) at:
      [<ffffffff82c15869>] mptcp_poll+0xb9/0x550
      but there are no more locks to release!
      Call Trace:
       lock_release+0x50f/0x750
       release_sock+0x171/0x1b0
       mptcp_poll+0xb9/0x550
       sock_poll+0x157/0x470
       ? get_net_ns+0xb0/0xb0
       do_sys_poll+0x63c/0xdd0
      
      Problem is that __mptcp_tcp_fallback() releases the mptcp socket lock,
      but after recent change it doesn't do this in all of its return paths.
      
      To fix this, remove the unlock from __mptcp_tcp_fallback() and
      always do the unlock in the caller.
      
      Also add a small comment as to why we have this
      __mptcp_needs_tcp_fallback().
      
      Fixes: 0b4f33de ("mptcp: fix tcp fallback crash")
      Reported-by: syzbot+e56606435b7bfeea8cf5@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e154659b
  9. 02 4月, 2020 3 次提交
    • F
      mptcp: re-check dsn before reading from subflow · de06f573
      Florian Westphal 提交于
      mptcp_subflow_data_available() is commonly called via
      ssk->sk_data_ready(), in this case the mptcp socket lock
      cannot be acquired.
      
      Therefore, while we can safely discard subflow data that
      was already received up to msk->ack_seq, we cannot be sure
      that 'subflow->data_avail' will still be valid at the time
      userspace wants to read the data -- a previous read on a
      different subflow might have carried this data already.
      
      In that (unlikely) event, msk->ack_seq will have been updated
      and will be ahead of the subflow dsn.
      
      We can check for this condition and skip/resync to the expected
      sequence number.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de06f573
    • F
      mptcp: subflow: check parent mptcp socket on subflow state change · 59832e24
      Florian Westphal 提交于
      This is needed at least until proper MPTCP-Level fin/reset
      signalling gets added:
      
      We wake parent when a subflow changes, but we should do this only
      when all subflows have closed, not just one.
      
      Schedule the mptcp worker and tell it to check eof state on all
      subflows.
      
      Only flag mptcp socket as closed and wake userspace processes blocking
      in poll if all subflows have closed.
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59832e24
    • F
      mptcp: fix tcp fallback crash · 0b4f33de
      Florian Westphal 提交于
      Christoph Paasch reports following crash:
      
      general protection fault [..]
      CPU: 0 PID: 2874 Comm: syz-executor072 Not tainted 5.6.0-rc5 #62
      RIP: 0010:__pv_queued_spin_lock_slowpath kernel/locking/qspinlock.c:471
      [..]
       queued_spin_lock_slowpath arch/x86/include/asm/qspinlock.h:50 [inline]
       do_raw_spin_lock include/linux/spinlock.h:181 [inline]
       spin_lock_bh include/linux/spinlock.h:343 [inline]
       __mptcp_flush_join_list+0x44/0xb0 net/mptcp/protocol.c:278
       mptcp_shutdown+0xb3/0x230 net/mptcp/protocol.c:1882
      [..]
      
      Problem is that mptcp_shutdown() socket isn't an mptcp socket,
      its a plain tcp_sk.  Thus, trying to access mptcp_sk specific
      members accesses garbage.
      
      Root cause is that accept() returns a fallback (tcp) socket, not an mptcp
      one.  There is code in getpeername to detect this and override the sockets
      stream_ops.  But this will only run when accept() caller provided a
      sockaddr struct.  "accept(fd, NULL, 0)" will therefore result in
      mptcp stream ops, but with sock->sk pointing at a tcp_sk.
      
      Update the existing fallback handling to detect this as well.
      
      Moreover, mptcp_shutdown did not have fallback handling, and
      mptcp_poll did it too late so add that there as well.
      Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Tested-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b4f33de
  10. 30 3月, 2020 12 次提交
  11. 24 3月, 2020 1 次提交
  12. 18 3月, 2020 1 次提交
  13. 15 3月, 2020 1 次提交
  14. 12 3月, 2020 1 次提交
    • D
      net: mptcp: don't hang before sending 'MP capable with data' · 767d3ded
      Davide Caratti 提交于
      the following packetdrill script
      
        socket(..., SOCK_STREAM, IPPROTO_MPTCP) = 3
        fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
        fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
        connect(3, ..., ...) = -1 EINPROGRESS (Operation now in progress)
        > S 0:0(0) <mss 1460,sackOK,TS val 100 ecr 0,nop,wscale 8,mpcapable v1 flags[flag_h] nokey>
        < S. 0:0(0) ack 1 win 65535 <mss 1460,sackOK,TS val 700 ecr 100,nop,wscale 8,mpcapable v1 flags[flag_h] key[skey=2]>
        > . 1:1(0) ack 1 win 256 <nop, nop, TS val 100 ecr 700,mpcapable v1 flags[flag_h] key[ckey,skey]>
        getsockopt(3, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
        fcntl(3, F_SETFL, O_RDWR) = 0
        write(3, ..., 1000) = 1000
      
      doesn't transmit 1KB data packet after a successful three-way-handshake,
      using mp_capable with data as required by protocol v1, and write() hangs
      forever:
      
       PID: 973    TASK: ffff97dd399cae80  CPU: 1   COMMAND: "packetdrill"
        #0 [ffffa9b94062fb78] __schedule at ffffffff9c90a000
        #1 [ffffa9b94062fc08] schedule at ffffffff9c90a4a0
        #2 [ffffa9b94062fc18] schedule_timeout at ffffffff9c90e00d
        #3 [ffffa9b94062fc90] wait_woken at ffffffff9c120184
        #4 [ffffa9b94062fcb0] sk_stream_wait_connect at ffffffff9c75b064
        #5 [ffffa9b94062fd20] mptcp_sendmsg at ffffffff9c8e801c
        #6 [ffffa9b94062fdc0] sock_sendmsg at ffffffff9c747324
        #7 [ffffa9b94062fdd8] sock_write_iter at ffffffff9c7473c7
        #8 [ffffa9b94062fe48] new_sync_write at ffffffff9c302976
        #9 [ffffa9b94062fed0] vfs_write at ffffffff9c305685
       #10 [ffffa9b94062ff00] ksys_write at ffffffff9c305985
       #11 [ffffa9b94062ff38] do_syscall_64 at ffffffff9c004475
       #12 [ffffa9b94062ff50] entry_SYSCALL_64_after_hwframe at ffffffff9ca0008c
           RIP: 00007f959407eaf7  RSP: 00007ffe9e95a910  RFLAGS: 00000293
           RAX: ffffffffffffffda  RBX: 0000000000000008  RCX: 00007f959407eaf7
           RDX: 00000000000003e8  RSI: 0000000001785fe0  RDI: 0000000000000008
           RBP: 0000000001785fe0   R8: 0000000000000000   R9: 0000000000000003
           R10: 0000000000000007  R11: 0000000000000293  R12: 00000000000003e8
           R13: 00007ffe9e95ae30  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      Fix it ensuring that socket state is TCP_ESTABLISHED on reception of the
      third ack.
      
      Fixes: 1954b860 ("mptcp: Check connection state before attempting send")
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767d3ded
  15. 10 3月, 2020 1 次提交
  16. 04 3月, 2020 2 次提交
  17. 27 2月, 2020 7 次提交
    • P
      mptcp: add dummy icsk_sync_mss() · dc24f8b4
      Paolo Abeni 提交于
      syzbot noted that the master MPTCP socket lacks the icsk_sync_mss
      callback, and was able to trigger a null pointer dereference:
      
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      PGD 8e171067 P4D 8e171067 PUD 93fa2067 PMD 0
      Oops: 0010 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 8984 Comm: syz-executor066 Not tainted 5.6.0-rc2-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:0x0
      Code: Bad RIP value.
      RSP: 0018:ffffc900020b7b80 EFLAGS: 00010246
      RAX: 1ffff110124ba600 RBX: 0000000000000000 RCX: ffff88809fefa600
      RDX: ffff8880994cdb18 RSI: 0000000000000000 RDI: ffff8880925d3140
      RBP: ffffc900020b7bd8 R08: ffffffff870225be R09: fffffbfff140652a
      R10: fffffbfff140652a R11: 0000000000000000 R12: ffff8880925d35d0
      R13: ffff8880925d3140 R14: dffffc0000000000 R15: 1ffff110124ba6ba
      FS:  0000000001a0b880(0000) GS:ffff8880aea00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffffffffffffffd6 CR3: 00000000a6d6f000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       cipso_v4_sock_setattr+0x34b/0x470 net/ipv4/cipso_ipv4.c:1888
       netlbl_sock_setattr+0x2a7/0x310 net/netlabel/netlabel_kapi.c:989
       smack_netlabel security/smack/smack_lsm.c:2425 [inline]
       smack_inode_setsecurity+0x3da/0x4a0 security/smack/smack_lsm.c:2716
       security_inode_setsecurity+0xb2/0x140 security/security.c:1364
       __vfs_setxattr_noperm+0x16f/0x3e0 fs/xattr.c:197
       vfs_setxattr fs/xattr.c:224 [inline]
       setxattr+0x335/0x430 fs/xattr.c:451
       __do_sys_fsetxattr fs/xattr.c:506 [inline]
       __se_sys_fsetxattr+0x130/0x1b0 fs/xattr.c:495
       __x64_sys_fsetxattr+0xbf/0xd0 fs/xattr.c:495
       do_syscall_64+0xf7/0x1c0 arch/x86/entry/common.c:294
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x440199
      Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007ffcadc19e48 EFLAGS: 00000246 ORIG_RAX: 00000000000000be
      RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440199
      RDX: 0000000020000200 RSI: 00000000200001c0 RDI: 0000000000000003
      RBP: 00000000006ca018 R08: 0000000000000003 R09: 00000000004002c8
      R10: 0000000000000009 R11: 0000000000000246 R12: 0000000000401a20
      R13: 0000000000401ab0 R14: 0000000000000000 R15: 0000000000000000
      Modules linked in:
      CR2: 0000000000000000
      
      Address the issue adding a dummy icsk_sync_mss callback.
      To properly sync the subflows mss and options list we need some
      additional infrastructure, which will land to net-next.
      
      Reported-by: syzbot+f4dfece964792d80b139@syzkaller.appspotmail.com
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc24f8b4
    • P
      mptcp: defer work schedule until mptcp lock is released · 14c441b5
      Paolo Abeni 提交于
      Don't schedule the work queue right away, instead defer this
      to the lock release callback.
      
      This has the advantage that it will give recv path a chance to
      complete -- this might have moved all pending packets from the
      subflow to the mptcp receive queue, which allows to avoid the
      schedule_work().
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14c441b5
    • F
      mptcp: avoid work queue scheduling if possible · 2e52213c
      Florian Westphal 提交于
      We can't lock_sock() the mptcp socket from the subflow data_ready callback,
      it would result in ABBA deadlock with the subflow socket lock.
      
      We can however grab the spinlock: if that succeeds and the mptcp socket
      is not owned at the moment, we can process the new skbs right away
      without deferring this to the work queue.
      
      This avoids the schedule_work and hence the small delay until the
      work item is processed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e52213c
    • F
      mptcp: remove mptcp_read_actor · bfae9dae
      Florian Westphal 提交于
      Only used to discard stale data from the subflow, so move
      it where needed.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfae9dae
    • F
      mptcp: add rmem queue accounting · 600911ff
      Florian Westphal 提交于
      If userspace never drains the receive buffers we must stop draining
      the subflow socket(s) at some point.
      
      This adds the needed rmem accouting for this.
      If the threshold is reached, we stop draining the subflows.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600911ff
    • F
      mptcp: update mptcp ack sequence from work queue · 6771bfd9
      Florian Westphal 提交于
      If userspace is not reading data, all the mptcp-level acks contain the
      ack_seq from the last time userspace read data rather than the most
      recent in-sequence value.
      
      This causes pointless retransmissions for data that is already queued.
      
      The reason for this is that all the mptcp protocol level processing
      happens at mptcp_recv time.
      
      This adds work queue to move skbs from the subflow sockets receive
      queue on the mptcp socket receive queue (which was not used so far).
      
      This allows us to announce the correct mptcp ack sequence in a timely
      fashion, even when the application does not call recv() on the mptcp socket
      for some time.
      
      We still wake userspace tasks waiting for POLLIN immediately:
      If the mptcp level receive queue is empty (because the work queue is
      still pending) it can be filled from in-sequence subflow sockets at
      recv time without a need to wait for the worker.
      
      The skb_orphan when moving skbs from subflow to mptcp level is needed,
      because the destructor (sock_rfree) relies on skb->sk (ssk!) lock
      being taken.
      
      A followup patch will add needed rmem accouting for the moved skbs.
      
      Other problem: In case application behaves as expected, and calls
      recv() as soon as mptcp socket becomes readable, the work queue will
      only waste cpu cycles.  This will also be addressed in followup patches.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6771bfd9
    • P
      mptcp: add work queue skeleton · 80992017
      Paolo Abeni 提交于
      Will be extended with functionality in followup patches.
      Initial user is moving skbs from subflows receive queue to
      the mptcp-level receive queue.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80992017