1. 15 9月, 2020 4 次提交
  2. 01 8月, 2020 2 次提交
    • F
      mptcp: enable JOIN requests even if cookies are in use · 9466a1cc
      Florian Westphal 提交于
      JOIN requests do not work in syncookie mode -- for HMAC validation, the
      peers nonce and the mptcp token (to obtain the desired connection socket
      the join is for) are required, but this information is only present in the
      initial syn.
      
      So either we need to drop all JOIN requests once a listening socket enters
      syncookie mode, or we need to store enough state to reconstruct the request
      socket later.
      
      This adds a state table (1024 entries) to store the data present in the
      MP_JOIN syn request and the random nonce used for the cookie syn/ack.
      
      When a MP_JOIN ACK passed cookie validation, the table is consulted
      to rebuild the request socket from it.
      
      An alternate approach would be to "cancel" syn-cookie mode and force
      MP_JOIN to always use a syn queue entry.
      
      However, doing so brings the backlog over the configured queue limit.
      
      v2: use req->syncookie, not (removed) want_cookie arg
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9466a1cc
    • F
      mptcp: subflow: add mptcp_subflow_init_cookie_req helper · c83a47e5
      Florian Westphal 提交于
      Will be used to initialize the mptcp request socket when a MP_CAPABLE
      request was handled in syncookie mode, i.e. when a TCP ACK containing a
      MP_CAPABLE option is a valid syncookie value.
      
      Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
      ID and stores it in the MPTCP token storage to be able to retrieve the
      mptcp socket for subflow joining.
      
      In syncookie case, we do not want to store any state, so just generate the
      unique ID and use it in the reply.
      
      This means there is a small window where another connection could generate
      the same token.
      
      When Cookie ACK comes back, we check that the token has not been registered
      in the mean time.  If it was, the connection needs to fall back to TCP.
      
      Changes in v2:
       - use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
         (Eric Dumazet)
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c83a47e5
  3. 29 7月, 2020 2 次提交
  4. 24 7月, 2020 1 次提交
  5. 22 7月, 2020 1 次提交
  6. 10 7月, 2020 1 次提交
  7. 08 7月, 2020 1 次提交
  8. 02 7月, 2020 1 次提交
    • F
      mptcp: add receive buffer auto-tuning · a6b118fe
      Florian Westphal 提交于
      When mptcp is used, userspace doesn't read from the tcp (subflow)
      socket but from the parent (mptcp) socket receive queue.
      
      skbs are moved from the subflow socket to the mptcp rx queue either from
      'data_ready' callback (if mptcp socket can be locked), a work queue, or
      the socket receive function.
      
      This means tcp_rcv_space_adjust() is never called and thus no receive
      buffer size auto-tuning is done.
      
      An earlier (not merged) patch added tcp_rcv_space_adjust() calls to the
      function that moves skbs from subflow to mptcp socket.
      While this enabled autotuning, it also meant tuning was done even if
      userspace was reading the mptcp socket very slowly.
      
      This adds mptcp_rcv_space_adjust() and calls it after userspace has
      read data from the mptcp socket rx queue.
      
      Its very similar to tcp_rcv_space_adjust, with two differences:
      
      1. The rtt estimate is the largest one observed on a subflow
      2. The rcvbuf size and window clamp of all subflows is adjusted
         to the mptcp-level rcvbuf.
      
      Otherwise, we get spurious drops at tcp (subflow) socket level if
      the skbs are not moved to the mptcp socket fast enough.
      
      Before:
      time mptcp_connect.sh -t -f $((4*1024*1024)) -d 300 -l 0.01% -r 0 -e "" -m mmap
      [..]
      ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration 40823ms) [ OK ]
      ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration 23119ms) [ OK ]
      ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5421ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration 41446ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration 23427ms) [ OK ]
      ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5426ms) [ OK ]
      Time: 1396 seconds
      
      After:
      ns4 MPTCP -> ns3 (10.0.3.2:10108      ) MPTCP   (duration  5417ms) [ OK ]
      ns4 MPTCP -> ns3 (10.0.3.2:10109      ) TCP     (duration  5427ms) [ OK ]
      ns4 TCP   -> ns3 (10.0.3.2:10110      ) MPTCP   (duration  5422ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10111) MPTCP   (duration  5415ms) [ OK ]
      ns4 MPTCP -> ns3 (dead:beef:3::2:10112) TCP     (duration  5422ms) [ OK ]
      ns4 TCP   -> ns3 (dead:beef:3::2:10113) MPTCP   (duration  5423ms) [ OK ]
      Time: 296 seconds
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6b118fe
  9. 30 6月, 2020 2 次提交
    • D
      mptcp: fallback in case of simultaneous connect · 8fd73804
      Davide Caratti 提交于
      when a MPTCP client tries to connect to itself, tcp_finish_connect() is
      never reached. Because of this, depending on the socket current state,
      multiple faulty behaviours can be observed:
      
      1) a WARN_ON() in subflow_data_ready() is hit
       WARNING: CPU: 2 PID: 882 at net/mptcp/subflow.c:911 subflow_data_ready+0x18b/0x230
       [...]
       CPU: 2 PID: 882 Comm: gh35 Not tainted 5.7.0+ #187
       [...]
       RIP: 0010:subflow_data_ready+0x18b/0x230
       [...]
       Call Trace:
        tcp_data_queue+0xd2f/0x4250
        tcp_rcv_state_process+0xb1c/0x49d3
        tcp_v4_do_rcv+0x2bc/0x790
        __release_sock+0x153/0x2d0
        release_sock+0x4f/0x170
        mptcp_shutdown+0x167/0x4e0
        __sys_shutdown+0xe6/0x180
        __x64_sys_shutdown+0x50/0x70
        do_syscall_64+0x9a/0x370
        entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      2) client is stuck forever in mptcp_sendmsg() because the socket is not
         TCP_ESTABLISHED
      
       crash> bt 4847
       PID: 4847   TASK: ffff88814b2fb100  CPU: 1   COMMAND: "gh35"
        #0 [ffff8881376ff680] __schedule at ffffffff97248da4
        #1 [ffff8881376ff778] schedule at ffffffff9724a34f
        #2 [ffff8881376ff7a0] schedule_timeout at ffffffff97252ba0
        #3 [ffff8881376ff8a8] wait_woken at ffffffff958ab4ba
        #4 [ffff8881376ff940] sk_stream_wait_connect at ffffffff96c2d859
        #5 [ffff8881376ffa28] mptcp_sendmsg at ffffffff97207fca
        #6 [ffff8881376ffbc0] sock_sendmsg at ffffffff96be1b5b
        #7 [ffff8881376ffbe8] sock_write_iter at ffffffff96be1daa
        #8 [ffff8881376ffce8] new_sync_write at ffffffff95e5cb52
        #9 [ffff8881376ffe50] vfs_write at ffffffff95e6547f
       #10 [ffff8881376ffe90] ksys_write at ffffffff95e65d26
       #11 [ffff8881376fff28] do_syscall_64 at ffffffff956088ba
       #12 [ffff8881376fff50] entry_SYSCALL_64_after_hwframe at ffffffff9740008c
           RIP: 00007f126f6956ed  RSP: 00007ffc2a320278  RFLAGS: 00000217
           RAX: ffffffffffffffda  RBX: 0000000020000044  RCX: 00007f126f6956ed
           RDX: 0000000000000004  RSI: 00000000004007b8  RDI: 0000000000000003
           RBP: 00007ffc2a3202a0   R8: 0000000000400720   R9: 0000000000400720
           R10: 0000000000400720  R11: 0000000000000217  R12: 00000000004004b0
           R13: 00007ffc2a320380  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: 0000000000000001  CS: 0033  SS: 002b
      
      3) tcpdump captures show that DSS is exchanged even when MP_CAPABLE handshake
         didn't complete.
      
       $ tcpdump -tnnr bad.pcap
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S], seq 3208913911, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291694721,nop,wscale 7,mptcp capable v1], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [S.], seq 3208913911, ack 3208913912, win 65483, options [mss 65495,sackOK,TS val 3291706876 ecr 3291706876,nop,wscale 7,mptcp capable v1], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 1, win 512, options [nop,nop,TS val 3291706876 ecr 3291706876], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [F.], seq 1, ack 1, win 512, options [nop,nop,TS val 3291707876 ecr 3291706876,mptcp dss fin seq 0 subseq 0 len 1,nop,nop], length 0
       IP 127.0.0.1.20000 > 127.0.0.1.20000: Flags [.], ack 2, win 512, options [nop,nop,TS val 3291707876 ecr 3291707876], length 0
      
      force a fallback to TCP in these cases, and adjust the main socket
      state to avoid hanging in mptcp_sendmsg().
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/35Reported-by: NChristoph Paasch <cpaasch@apple.com>
      Suggested-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fd73804
    • D
      net: mptcp: improve fallback to TCP · e1ff9e82
      Davide Caratti 提交于
      Keep using MPTCP sockets and a use "dummy mapping" in case of fallback
      to regular TCP. When fallback is triggered, skip addition of the MPTCP
      option on send.
      
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/11
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/22Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1ff9e82
  10. 27 6月, 2020 2 次提交
  11. 24 6月, 2020 2 次提交
  12. 19 6月, 2020 1 次提交
  13. 16 6月, 2020 2 次提交
  14. 23 5月, 2020 1 次提交
  15. 17 5月, 2020 1 次提交
    • C
      mptcp: Use 32-bit DATA_ACK when possible · a0c1d0ea
      Christoph Paasch 提交于
      RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not
      sending 64-bit data-sequence numbers. The 64-bit DSN is only there for
      extreme scenarios when a very high throughput subflow is combined with a
      long-RTT subflow such that the high-throughput subflow wraps around the
      32-bit sequence number space within an RTT of the high-RTT subflow.
      
      It is thus a rare scenario and we should try to use the 32-bit DATA_ACK
      instead as long as possible. It allows to reduce the TCP-option overhead
      by 4 bytes, thus makes space for an additional SACK-block. It also makes
      tcpdumps much easier to read when the DSN and DATA_ACK are both either
      32 or 64-bit.
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0c1d0ea
  16. 01 5月, 2020 1 次提交
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
  17. 21 4月, 2020 1 次提交
  18. 02 4月, 2020 1 次提交
  19. 30 3月, 2020 12 次提交
  20. 20 3月, 2020 1 次提交