1. 08 8月, 2020 1 次提交
  2. 06 8月, 2020 1 次提交
    • P
      mptcp: be careful on subflow creation · adf73410
      Paolo Abeni 提交于
      Nicolas reported the following oops:
      
      [ 1521.392541] BUG: kernel NULL pointer dereference, address: 00000000000000c0
      [ 1521.394189] #PF: supervisor read access in kernel mode
      [ 1521.395376] #PF: error_code(0x0000) - not-present page
      [ 1521.396607] PGD 0 P4D 0
      [ 1521.397156] Oops: 0000 [#1] SMP PTI
      [ 1521.398020] CPU: 0 PID: 22986 Comm: kworker/0:2 Not tainted 5.8.0-rc4+ #109
      [ 1521.399618] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      [ 1521.401728] Workqueue: events mptcp_worker
      [ 1521.402651] RIP: 0010:mptcp_subflow_create_socket+0xf1/0x1c0
      [ 1521.403954] Code: 24 08 89 44 24 04 48 8b 7a 18 e8 2a 48 d4 ff 8b 44 24 04 85 c0 75 7a 48 8b 8b 78 02 00 00 48 8b 54 24 08 48 8d bb 80 00 00 00 <48> 8b 89 c0 00 00 00 48 89 8a c0 00 00 00 48 8b 8b 78 02 00 00 8b
      [ 1521.408201] RSP: 0000:ffffabc4002d3c60 EFLAGS: 00010246
      [ 1521.409433] RAX: 0000000000000000 RBX: ffffa0b9ad8c9a00 RCX: 0000000000000000
      [ 1521.411096] RDX: ffffa0b9ae78a300 RSI: 00000000fffffe01 RDI: ffffa0b9ad8c9a80
      [ 1521.412734] RBP: ffffa0b9adff2e80 R08: ffffa0b9af02d640 R09: ffffa0b9ad923a00
      [ 1521.414333] R10: ffffabc4007139f8 R11: fefefefefefefeff R12: ffffabc4002d3cb0
      [ 1521.415918] R13: ffffa0b9ad91fa58 R14: ffffa0b9ad8c9f9c R15: 0000000000000000
      [ 1521.417592] FS:  0000000000000000(0000) GS:ffffa0b9af000000(0000) knlGS:0000000000000000
      [ 1521.419490] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1521.420839] CR2: 00000000000000c0 CR3: 000000002951e006 CR4: 0000000000160ef0
      [ 1521.422511] Call Trace:
      [ 1521.423103]  __mptcp_subflow_connect+0x94/0x1f0
      [ 1521.425376]  mptcp_pm_create_subflow_or_signal_addr+0x200/0x2a0
      [ 1521.426736]  mptcp_worker+0x31b/0x390
      [ 1521.431324]  process_one_work+0x1fc/0x3f0
      [ 1521.432268]  worker_thread+0x2d/0x3b0
      [ 1521.434197]  kthread+0x117/0x130
      [ 1521.435783]  ret_from_fork+0x22/0x30
      
      on some unconventional configuration.
      
      The MPTCP protocol is trying to create a subflow for an
      unaccepted server socket. That is allowed by the RFC, even
      if subflow creation will likely fail.
      Unaccepted sockets have still a NULL sk_socket field,
      avoid the issue by failing earlier.
      Reported-and-tested-by: NNicolas Rybowski <nicolas.rybowski@tessares.net>
      Fixes: 7d14b0d2 ("mptcp: set correct vfs info for subflows")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adf73410
  3. 01 8月, 2020 5 次提交
  4. 29 7月, 2020 2 次提交
    • M
      mptcp: Only use subflow EOF signaling on fallback connections · 067a0b3d
      Mat Martineau 提交于
      The MPTCP state machine handles disconnections on non-fallback connections,
      but the mptcp_sock still needs to get notified when fallback subflows
      disconnect.
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      067a0b3d
    • M
      mptcp: Use full MPTCP-level disconnect state machine · 43b54c6e
      Mat Martineau 提交于
      RFC 8684 appendix D describes the connection state machine for
      MPTCP. This patch implements the DATA_FIN / DATA_ACK exchanges and
      MPTCP-level socket state changes described in that appendix, rather than
      simply sending DATA_FIN along with TCP FIN when disconnecting subflows.
      
      DATA_FIN is now sent and acknowledged before shutting down the
      subflows. Received DATA_FIN information (if not part of a data packet)
      is written to the MPTCP socket when the incoming DSS option is parsed by
      the subflow, and the MPTCP worker is scheduled to process the
      flag. DATA_FIN received as part of a full DSS mapping will be handled
      when the mapping is processed.
      
      The DATA_FIN is acknowledged by the worker if the reader is caught
      up. If there is still data to be moved to the MPTCP-level queue, ack_seq
      will be incremented to account for the DATA_FIN when it reaches the end
      of the stream and a DATA_ACK will be sent to the peer.
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43b54c6e
  5. 24 7月, 2020 6 次提交
  6. 18 7月, 2020 1 次提交
  7. 07 7月, 2020 1 次提交
    • D
      mptcp: fix race in subflow_data_ready() · d47a7215
      Davide Caratti 提交于
      syzkaller was able to make the kernel reach subflow_data_ready() for a
      server subflow that was closed before subflow_finish_connect() completed.
      In these cases we can avoid using the path for regular/fallback MPTCP
      data, and just wake the main socket, to avoid the following warning:
      
       WARNING: CPU: 0 PID: 9370 at net/mptcp/subflow.c:885
       subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
       Kernel panic - not syncing: panic_on_warn set ...
       CPU: 0 PID: 9370 Comm: syz-executor.0 Not tainted 5.7.0 #106
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
       rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
       Call Trace:
        <IRQ>
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0xb7/0xfe lib/dump_stack.c:118
        panic+0x29e/0x692 kernel/panic.c:221
        __warn.cold+0x2f/0x3d kernel/panic.c:582
        report_bug+0x28b/0x2f0 lib/bug.c:195
        fixup_bug arch/x86/kernel/traps.c:105 [inline]
        fixup_bug arch/x86/kernel/traps.c:100 [inline]
        do_error_trap+0x10f/0x180 arch/x86/kernel/traps.c:197
        do_invalid_op+0x32/0x40 arch/x86/kernel/traps.c:216
        invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:1027
       RIP: 0010:subflow_data_ready+0x1e6/0x290 net/mptcp/subflow.c:885
       Code: 04 02 84 c0 74 06 0f 8e 91 00 00 00 41 0f b6 5e 48 31 ff 83 e3 18
       89 de e8 37 ec 3d fe 84 db 0f 85 65 ff ff ff e8 fa ea 3d fe <0f> 0b e9
       59 ff ff ff e8 ee ea 3d fe 48 89 ee 4c 89 ef e8 f3 77 ff
       RSP: 0018:ffff88811b2099b0 EFLAGS: 00010206
       RAX: ffff888111197000 RBX: 0000000000000000 RCX: ffffffff82fbc609
       RDX: 0000000000000100 RSI: ffffffff82fbc616 RDI: 0000000000000001
       RBP: ffff8881111bc800 R08: ffff888111197000 R09: ffffed10222a82af
       R10: ffff888111541577 R11: ffffed10222a82ae R12: 1ffff11023641336
       R13: ffff888111541000 R14: ffff88810fd4ca00 R15: ffff888111541570
        tcp_child_process+0x754/0x920 net/ipv4/tcp_minisocks.c:841
        tcp_v4_do_rcv+0x749/0x8b0 net/ipv4/tcp_ipv4.c:1642
        tcp_v4_rcv+0x2666/0x2e60 net/ipv4/tcp_ipv4.c:1999
        ip_protocol_deliver_rcu+0x29/0x1f0 net/ipv4/ip_input.c:204
        ip_local_deliver_finish net/ipv4/ip_input.c:231 [inline]
        NF_HOOK include/linux/netfilter.h:421 [inline]
        ip_local_deliver+0x2da/0x390 net/ipv4/ip_input.c:252
        dst_input include/net/dst.h:441 [inline]
        ip_rcv_finish net/ipv4/ip_input.c:428 [inline]
        ip_rcv_finish net/ipv4/ip_input.c:414 [inline]
        NF_HOOK include/linux/netfilter.h:421 [inline]
        ip_rcv+0xef/0x140 net/ipv4/ip_input.c:539
        __netif_receive_skb_one_core+0x197/0x1e0 net/core/dev.c:5268
        __netif_receive_skb+0x27/0x1c0 net/core/dev.c:5382
        process_backlog+0x1e5/0x6d0 net/core/dev.c:6226
        napi_poll net/core/dev.c:6671 [inline]
        net_rx_action+0x3e3/0xd70 net/core/dev.c:6739
        __do_softirq+0x18c/0x634 kernel/softirq.c:292
        do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082
        </IRQ>
        do_softirq.part.0+0x26/0x30 kernel/softirq.c:337
        do_softirq arch/x86/include/asm/preempt.h:26 [inline]
        __local_bh_enable_ip+0x46/0x50 kernel/softirq.c:189
        local_bh_enable include/linux/bottom_half.h:32 [inline]
        rcu_read_unlock_bh include/linux/rcupdate.h:723 [inline]
        ip_finish_output2+0x78a/0x19c0 net/ipv4/ip_output.c:229
        __ip_finish_output+0x471/0x720 net/ipv4/ip_output.c:306
        dst_output include/net/dst.h:435 [inline]
        ip_local_out+0x181/0x1e0 net/ipv4/ip_output.c:125
        __ip_queue_xmit+0x7a1/0x14e0 net/ipv4/ip_output.c:530
        __tcp_transmit_skb+0x19dc/0x35e0 net/ipv4/tcp_output.c:1238
        __tcp_send_ack.part.0+0x3c2/0x5b0 net/ipv4/tcp_output.c:3785
        __tcp_send_ack net/ipv4/tcp_output.c:3791 [inline]
        tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3791
        tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:6040 [inline]
        tcp_rcv_state_process+0x36a4/0x49c2 net/ipv4/tcp_input.c:6209
        tcp_v4_do_rcv+0x343/0x8b0 net/ipv4/tcp_ipv4.c:1651
        sk_backlog_rcv include/net/sock.h:996 [inline]
        __release_sock+0x1ad/0x310 net/core/sock.c:2548
        release_sock+0x54/0x1a0 net/core/sock.c:3064
        inet_wait_for_connect net/ipv4/af_inet.c:594 [inline]
        __inet_stream_connect+0x57e/0xd50 net/ipv4/af_inet.c:686
        inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:725
        mptcp_stream_connect+0x171/0x5f0 net/mptcp/protocol.c:1920
        __sys_connect_file net/socket.c:1854 [inline]
        __sys_connect+0x267/0x2f0 net/socket.c:1871
        __do_sys_connect net/socket.c:1882 [inline]
        __se_sys_connect net/socket.c:1879 [inline]
        __x64_sys_connect+0x6f/0xb0 net/socket.c:1879
        do_syscall_64+0xb7/0x3d0 arch/x86/entry/common.c:295
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
       RIP: 0033:0x7fb577d06469
       Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89
       f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01
       f0 ff ff 73 01 c3 48 8b 0d ff 49 2b 00 f7 d8 64 89 01 48
       RSP: 002b:00007fb5783d5dd8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
       RAX: ffffffffffffffda RBX: 000000000068bfa0 RCX: 00007fb577d06469
       RDX: 000000000000004d RSI: 0000000020000040 RDI: 0000000000000003
       RBP: 00000000ffffffff R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
       R13: 000000000041427c R14: 00007fb5783d65c0 R15: 0000000000000003
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/39Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Fixes: e1ff9e82 ("net: mptcp: improve fallback to TCP")
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d47a7215
  8. 02 7月, 2020 1 次提交
    • F
      mptcp: add receive buffer auto-tuning · a6b118fe
      Florian Westphal 提交于
      When mptcp is used, userspace doesn't read from the tcp (subflow)
      socket but from the parent (mptcp) socket receive queue.
      
      skbs are moved from the subflow socket to the mptcp rx queue either from
      'data_ready' callback (if mptcp socket can be locked), a work queue, or
      the socket receive function.
      
      This means tcp_rcv_space_adjust() is never called and thus no receive
      buffer size auto-tuning is done.
      
      An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
      function that moves skbs from subflow to mptcp socket.
      While this enabled autotuning, it also meant tuning was done even if
      userspace was reading the mptcp socket very slowly.
      
      This adds mptcp_rcv_space_adjust() and calls it after userspace has
      read data from the mptcp socket rx queue.
      
      Its very similar to tcp_rcv_space_adjust, with two differences:
      
      1. The rtt estimate is the largest one observed on a subflow
      2. The rcvbuf size and window clamp of all subflows is adjusted
         to the mptcp-level rcvbuf.
      
      Otherwise, we get spurious drops at tcp (subflow) socket level if
      the skbs are not moved to the mptcp socket fast enough.
      
      Before:
      time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
      [..]
      ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40823ms) [ OK ]
      ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration 23119ms) [ OK ]
      ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5421ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41446ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration 23427ms) [ OK ]
      ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5426ms) [ OK ]
      Time: 1396 seconds
      
      After:
      ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
      ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5427ms) [ OK ]
      ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5422ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5415ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5422ms) [ OK ]
      ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5423ms) [ OK ]
      Time: 296 seconds
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6b118fe
  9. 01 7月, 2020 1 次提交
  10. 30 6月, 2020 2 次提交
    • D
      mptcp: fallback in case of simultaneous connect · 8fd73804
      Davide Caratti 提交于
      when a MPTCP client tries to connect to itself, tcp_finish_connect() is
      never reached. Because of this, depending on the socket current state,
      multiple faulty behaviours can be observed:
      
      1) a WARN_ON() in subflow_data_ready() is hit
       WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
       [...]
       CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
       [...]
       RIP: 0010:subflow_data_ready+0x18b/0x230
       [...]
       Call Trace:
        tcp_data_queue+0xd2f/0x4250
        tcp_rcv_state_process+0xb1c/0x49d3
        tcp_v4_do_rcv+0x2bc/0x790
        __release_sock+0x153/0x2d0
        release_sock+0x4f/0x170
        mptcp_shutdown+0x167/0x4e0
        __sys_shutdown+0xe6/0x180
        __x64_sys_shutdown+0x50/0x70
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      2) client is stuck forever in mptcp_sendmsg() because the socket is not
         TCP_ESTABLISHED
      
       crash> bt 4847
       PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
        #0 [ffff8881376ff680] __schedule at ffffffff97248da4
        #1 [ffff8881376ff778] schedule at ffffffff9724a34f
        #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
        #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
        #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
        #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
        #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
        #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
        #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
        #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
       #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
       #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
       #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
           RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
           RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
           RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
           RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
           R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
           R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
         didn't complete.
      
       $ tcpdump -tnnr bad.pcap
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
      
      force a fallback to TCP in these cases, and adjust the main socket
      state to avoid hanging in mptcp_sendmsg().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fd73804
    • D
      net: mptcp: improve fallback to TCP · e1ff9e82
      Davide Caratti 提交于
      Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
      to regular TCP. When fallback is triggered, skip addition of the MPTCP
      option on send.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1ff9e82
  11. 27 6月, 2020 2 次提交
  12. 19 6月, 2020 2 次提交
  13. 16 6月, 2020 1 次提交
    • W
      mptcp: fix memory leak in mptcp_subflow_create_socket() · b8ad540d
      Wei Yongjun 提交于
      socket malloced  by sock_create_kern() should be release before return
      in the error handling, otherwise it cause memory leak.
      
      unreferenced object 0xffff88810910c000 (size 1216):
        comm "00000003_test_m", pid 12238, jiffies 4295050289 (age 54.237s)
        hex dump (first 32 bytes):
          01 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 2f 30 0a 81 88 ff ff  ........./0.....
        backtrace:
          [<00000000e877f89f>] sock_alloc_inode+0x18/0x1c0
          [<0000000093d1dd51>] alloc_inode+0x63/0x1d0
          [<000000005673fec6>] new_inode_pseudo+0x14/0xe0
          [<00000000b5db6be8>] sock_alloc+0x3c/0x260
          [<00000000e7e3cbb2>] __sock_create+0x89/0x620
          [<0000000023e48593>] mptcp_subflow_create_socket+0xc0/0x5e0
          [<00000000419795e4>] __mptcp_socket_create+0x1ad/0x3f0
          [<00000000b2f942e8>] mptcp_stream_connect+0x281/0x4f0
          [<00000000c80cd5cc>] __sys_connect_file+0x14d/0x190
          [<00000000dc761f11>] __sys_connect+0x128/0x160
          [<000000008b14e764>] __x64_sys_connect+0x6f/0xb0
          [<000000007b4f93bd>] do_syscall_64+0xa1/0x530
          [<00000000d3e770b6>] entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8ad540d
  14. 11 6月, 2020 1 次提交
  15. 31 5月, 2020 1 次提交
    • P
      mptcp: fix NULL ptr dereference in MP_JOIN error path · 39884604
      Paolo Abeni 提交于
      When token lookup on MP_JOIN 3rd ack fails, the server
      socket closes with a reset the incoming child. Such socket
      has the 'is_mptcp' flag set, but no msk socket associated
      - due to the failed lookup.
      
      While crafting the reset packet mptcp_established_options_mp()
      will try to dereference the child's master socket, causing
      a NULL ptr dereference.
      
      This change addresses the issue with explicit fallback to
      TCP in such error path.
      
      Fixes: 729cd643 ("mptcp: cope better with MP_JOIN failure")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39884604
  16. 23 5月, 2020 1 次提交
  17. 17 5月, 2020 1 次提交
    • C
      mptcp: Use 32-bit DATA_ACK when possible · a0c1d0ea
      Christoph Paasch 提交于
      RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not
      sending 64-bit data-sequence numbers. The 64-bit DSN is only there for
      extreme scenarios when a very high throughput subflow is combined with a
      long-RTT subflow such that the high-throughput subflow wraps around the
      32-bit sequence number space within an RTT of the high-RTT subflow.
      
      It is thus a rare scenario and we should try to use the 32-bit DATA_ACK
      instead as long as possible. It allows to reduce the TCP-option overhead
      by 4 bytes, thus makes space for an additional SACK-block. It also makes
      tcpdumps much easier to read when the DSN and DATA_ACK are both either
      32 or 64-bit.
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0c1d0ea
  18. 16 5月, 2020 2 次提交
    • P
      mptcp: cope better with MP_JOIN failure · 729cd643
      Paolo Abeni 提交于
      Currently, on MP_JOIN failure we reset the child
      socket, but leave the request socket untouched.
      
      tcp_check_req will deal with it according to the
      'tcp_abort_on_overflow' sysctl value - by default the
      req socket will stay alive.
      
      The above leads to inconsistent behavior on MP JOIN
      failure, and bad listener overflow accounting.
      
      This patch addresses the issue leveraging the infrastructure
      just introduced to ask the TCP stack to drop the req on
      failure.
      
      The child socket is not freed anymore by subflow_syn_recv_sock(),
      instead it's moved to a dead state and will be disposed by the
      next sock_put done by the TCP stack, so that listener overflow
      accounting is not affected by MP JOIN failure.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      729cd643
    • P
      mptcp: add new sock flag to deal with join subflows · 90bf4513
      Paolo Abeni 提交于
      MP_JOIN subflows must not land into the accept queue.
      Currently tcp_check_req() calls an mptcp specific helper
      to detect such scenario.
      
      Such helper leverages the subflow context to check for
      MP_JOIN subflows. We need to deal also with MP JOIN
      failures, even when the subflow context is not available
      due allocation failure.
      
      A possible solution would be changing the syn_recv_sock()
      signature to allow returning a more descriptive action/
      error code and deal with that in tcp_check_req().
      
      Since the above need is MPTCP specific, this patch instead
      uses a TCP request socket hole to add a MPTCP specific flag.
      Such flag is used by the MPTCP syn_recv_sock() to tell
      tcp_check_req() how to deal with the request socket.
      
      This change is a no-op for !MPTCP build, and makes the
      MPTCP code simpler. It allows also the next patch to deal
      correctly with MP JOIN failure.
      
      v1 -> v2:
       - be more conservative on drop_req initialization (Mat)
      
      RFC -> v1:
       - move the drop_req bit inside tcp_request_sock (Eric)
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Reviewed-by: NChristoph Paasch <cpaasch@apple.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90bf4513
  19. 08 5月, 2020 1 次提交
    • P
      mptcp: set correct vfs info for subflows · 7d14b0d2
      Paolo Abeni 提交于
      When a subflow is created via mptcp_subflow_create_socket(),
      a new 'struct socket' is allocated, with a new i_ino value.
      
      When inspecting TCP sockets via the procfs and or the diag
      interface, the above ones are not related to the process owning
      the MPTCP master socket, even if they are a logical part of it
      ('ss -p' shows an empty process field)
      
      Additionally, subflows created by the path manager get
      the uid/gid from the running workqueue.
      
      Subflows are part of the owning MPTCP master socket, let's
      adjust the vfs info to reflect this.
      
      After this patch, 'ss' correctly displays subflows as belonging
      to the msk socket creator.
      
      Fixes: 2303f994 ("mptcp: Associate MPTCP context with TCP socket")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d14b0d2
  20. 01 5月, 2020 3 次提交
    • P
      mptcp: fix uninitialized value access · ac2b47fb
      Paolo Abeni 提交于
      tcp_v{4,6}_syn_recv_sock() set 'own_req' only when returning
      a not NULL 'child', let's check 'own_req' only if child is
      available to avoid an - unharmful - UBSAN splat.
      
      v1 -> v2:
       - reference the correct hash
      
      Fixes: 4c8941de ("mptcp: avoid flipping mp_capable field in syn_recv_sock()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac2b47fb
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
    • P
      mptcp: consolidate synack processing. · 263e1201
      Paolo Abeni 提交于
      Currently the MPTCP code uses 2 hooks to process syn-ack
      packets, mptcp_rcv_synsent() and the sk_rx_dst_set()
      callback.
      
      We can drop the first, moving the relevant code into the
      latter, reducing the hooking into the TCP code. This is
      also needed by the next patch.
      
      v1 -> v2:
       - use local tcp sock ptr instead of casting the sk variable
         several times - DaveM
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      263e1201
  21. 26 4月, 2020 2 次提交
    • P
      mptcp: fix race in msk status update · 1200832c
      Paolo Abeni 提交于
      Currently subflow_finish_connect() changes unconditionally
      any msk socket status other than TCP_ESTABLISHED.
      
      If an unblocking connect() races with close(), we can end-up
      triggering:
      
      IPv4: Attempt to release TCP socket in state 1 00000000e32b8b7e
      
      when the msk socket is disposed.
      
      Be sure to enter the established status only from SYN_SENT.
      
      Fixes: c3c123d1 ("net: mptcp: don't hang in mptcp_sendmsg() after TCP fallback")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1200832c
    • F
      tcp: mptcp: use mptcp receive buffer space to select rcv window · 071c8ed6
      Florian Westphal 提交于
      In MPTCP, the receive window is shared across all subflows, because it
      refers to the mptcp-level sequence space.
      
      MPTCP receivers already place incoming packets on the mptcp socket
      receive queue and will charge it to the mptcp socket rcvbuf until
      userspace consumes the data.
      
      Update __tcp_select_window to use the occupancy of the parent/mptcp
      socket instead of the subflow socket in case the tcp socket is part
      of a logical mptcp connection.
      
      This commit doesn't change choice of initial window for passive or active
      connections.
      While it would be possible to change those as well, this adds complexity
      (especially when handling MP_JOIN requests).  Furthermore, the MPTCP RFC
      specifically says that a MPTCP sender 'MUST NOT use the RCV.WND field
      of a TCP segment at the connection level if it does not also carry a DSS
      option with a Data ACK field.'
      
      SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      071c8ed6
  22. 21 4月, 2020 2 次提交