1. 17 11月, 2020 3 次提交
    • F
      mptcp: keep track of advertised windows right edge · 6f8a612a
      Florian Westphal 提交于
      Before sending 'x' new bytes also check that the new snd_una would
      be within the permitted receive window.
      
      For every ACK that also contains a DSS ack, check whether its tcp-level
      receive window would advance the current mptcp window right edge and
      update it if so.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      6f8a612a
    • P
      mptcp: refactor shutdown and close · e16163b6
      Paolo Abeni 提交于
      We must not close the subflows before all the MPTCP level
      data, comprising the DATA_FIN has been acked at the MPTCP
      level, otherwise we could be unable to retransmit as needed.
      
      __mptcp_wr_shutdown() shutdown is responsible to check for the
      correct status and close all subflows. Is called by the output
      path after spooling any data and at shutdown/close time.
      
      In a similar way, __mptcp_destroy_sock() is responsible to clean-up
      the MPTCP level status, and is called when the msk transition
      to TCP_CLOSE.
      
      The protocol level close() does not force anymore the TCP_CLOSE
      status, but orphan the msk socket and all the subflows.
      Orphaned msk sockets are forciby closed after a timeout or
      when all MPTCP-level data is acked.
      
      There is a caveat about keeping the orphaned subflows around:
      the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
      tcp_close(). To prevent accessing freed memory on later MPTCP
      level operations, the msk acquires a reference to each subflow
      socket and prevent subflow_ulp_release() from releasing the
      subflow context before __mptcp_destroy_sock().
      
      The additional subflow references are released by __mptcp_done()
      and the async ULP release is detected checking ULP ops. If such
      field has been already cleared by the ULP release path, the
      dangling context is freed directly by __mptcp_done().
      Co-developed-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e16163b6
    • P
      mptcp: introduce MPTCP snd_nxt · eaa2ffab
      Paolo Abeni 提交于
      Track the next MPTCP sequence number used on xmit,
      currently always equal to write_next.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      eaa2ffab
  2. 21 10月, 2020 2 次提交
  3. 11 10月, 2020 1 次提交
  4. 09 10月, 2020 1 次提交
  5. 06 10月, 2020 1 次提交
  6. 04 10月, 2020 1 次提交
  7. 30 9月, 2020 2 次提交
  8. 25 9月, 2020 7 次提交
  9. 29 7月, 2020 5 次提交
  10. 24 7月, 2020 1 次提交
  11. 08 7月, 2020 1 次提交
  12. 30 6月, 2020 1 次提交
  13. 23 6月, 2020 1 次提交
  14. 09 6月, 2020 1 次提交
  15. 23 5月, 2020 1 次提交
  16. 20 5月, 2020 1 次提交
  17. 17 5月, 2020 1 次提交
    • C
      mptcp: Use 32-bit DATA_ACK when possible · a0c1d0ea
      Christoph Paasch 提交于
      RFC8684 allows to send 32-bit DATA_ACKs as long as the peer is not
      sending 64-bit data-sequence numbers. The 64-bit DSN is only there for
      extreme scenarios when a very high throughput subflow is combined with a
      long-RTT subflow such that the high-throughput subflow wraps around the
      32-bit sequence number space within an RTT of the high-RTT subflow.
      
      It is thus a rare scenario and we should try to use the 32-bit DATA_ACK
      instead as long as possible. It allows to reduce the TCP-option overhead
      by 4 bytes, thus makes space for an additional SACK-block. It also makes
      tcpdumps much easier to read when the DSN and DATA_ACK are both either
      32 or 64-bit.
      Signed-off-by: NChristoph Paasch <cpaasch@apple.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0c1d0ea
  18. 01 5月, 2020 5 次提交
    • P
      mptcp: initialize the data_fin field for mpc packets · a77895db
      Paolo Abeni 提交于
      When parsing MPC+data packets we set the dss field, so
      we must also initialize the data_fin, or we can find stray
      value there.
      
      Fixes: 9a19371b ("mptcp: fix data_fin handing in RX path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a77895db
    • P
      mptcp: fix 'use_ack' option access. · 5a91e32b
      Paolo Abeni 提交于
      The mentioned RX option field is initialized only for DSS
      packet, we must access it only if 'dss' is set too, or
      the subflow will end-up in a bad status, leading to
      RFC violations.
      
      Fixes: d22f4988 ("mptcp: process MP_CAPABLE data option")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a91e32b
    • P
      mptcp: avoid a WARN on bad input. · d6085fe1
      Paolo Abeni 提交于
      Syzcaller has found a way to trigger the WARN_ON_ONCE condition
      in check_fully_established().
      
      The root cause is a legit fallback to TCP scenario, so replace
      the WARN with a plain message on a more strict condition.
      
      Fixes: f296234c ("mptcp: Add handling of incoming MP_JOIN requests")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6085fe1
    • P
      mptcp: move option parsing into mptcp_incoming_options() · cfde141e
      Paolo Abeni 提交于
      The mptcp_options_received structure carries several per
      packet flags (mp_capable, mp_join, etc.). Such fields must
      be cleared on each packet, even on dropped ones or packet
      not carrying any MPTCP options, but the current mptcp
      code clears them only on TCP option reset.
      
      On several races/corner cases we end-up with stray bits in
      incoming options, leading to WARN_ON splats. e.g.:
      
      [  171.164906] Bad mapping: ssn=32714 map_seq=1 map_data_len=32713
      [  171.165006] WARNING: CPU: 1 PID: 5026 at net/mptcp/subflow.c:533 warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.167632] Modules linked in: ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel geneve ip6_udp_tunnel udp_tunnel macsec macvtap tap ipvlan macvlan 8021q garp mrp xfrm_interface veth netdevsim nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun binfmt_misc intel_rapl_msr intel_rapl_common rfkill kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel joydev virtio_balloon pcspkr i2c_piix4 sunrpc ip_tables xfs libcrc32c crc32c_intel serio_raw virtio_console ata_generic virtio_blk virtio_net net_failover failover ata_piix libata
      [  171.199464] CPU: 1 PID: 5026 Comm: repro Not tainted 5.7.0-rc1.mptcp_f227fdf5d388+ #95
      [  171.200886] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-2.fc30 04/01/2014
      [  171.202546] RIP: 0010:warn_bad_map (linux-mptcp/net/mptcp/subflow.c:533 linux-mptcp/net/mptcp/subflow.c:531)
      [  171.206537] Code: c1 ea 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 04 84 d2 75 1d 8b 55 3c 44 89 e6 48 c7 c7 20 51 13 95 e8 37 8b 22 fe <0f> 0b 48 83 c4 08 5b 5d 41 5c c3 89 4c 24 04 e8 db d6 94 fe 8b 4c
      [  171.220473] RSP: 0018:ffffc90000150560 EFLAGS: 00010282
      [  171.221639] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      [  171.223108] RDX: 0000000000000000 RSI: 0000000000000008 RDI: fffff5200002a09e
      [  171.224388] RBP: ffff8880aa6e3c00 R08: 0000000000000001 R09: fffffbfff2ec9955
      [  171.225706] R10: ffffffff9764caa7 R11: fffffbfff2ec9954 R12: 0000000000007fca
      [  171.227211] R13: ffff8881066f4a7f R14: ffff8880aa6e3c00 R15: 0000000000000020
      [  171.228460] FS:  00007f8623719740(0000) GS:ffff88810be00000(0000) knlGS:0000000000000000
      [  171.230065] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  171.231303] CR2: 00007ffdab190a50 CR3: 00000001038ea006 CR4: 0000000000160ee0
      [  171.232586] Call Trace:
      [  171.233109]  <IRQ>
      [  171.233531] get_mapping_status (linux-mptcp/net/mptcp/subflow.c:691)
      [  171.234371] mptcp_subflow_data_available (linux-mptcp/net/mptcp/subflow.c:736 linux-mptcp/net/mptcp/subflow.c:832)
      [  171.238181] subflow_state_change (linux-mptcp/net/mptcp/subflow.c:1085 (discriminator 1))
      [  171.239066] tcp_fin (linux-mptcp/net/ipv4/tcp_input.c:4217)
      [  171.240123] tcp_data_queue (linux-mptcp/./include/linux/compiler.h:199 linux-mptcp/net/ipv4/tcp_input.c:4822)
      [  171.245083] tcp_rcv_established (linux-mptcp/./include/linux/skbuff.h:1785 linux-mptcp/./include/net/tcp.h:1774 linux-mptcp/./include/net/tcp.h:1847 linux-mptcp/net/ipv4/tcp_input.c:5238 linux-mptcp/net/ipv4/tcp_input.c:5730)
      [  171.254089] tcp_v4_rcv (linux-mptcp/./include/linux/spinlock.h:393 linux-mptcp/net/ipv4/tcp_ipv4.c:2009)
      [  171.258969] ip_protocol_deliver_rcu (linux-mptcp/net/ipv4/ip_input.c:204 (discriminator 1))
      [  171.260214] ip_local_deliver_finish (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/ipv4/ip_input.c:232)
      [  171.261389] ip_local_deliver (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:252)
      [  171.265884] ip_rcv (linux-mptcp/./include/linux/netfilter.h:307 linux-mptcp/./include/linux/netfilter.h:301 linux-mptcp/net/ipv4/ip_input.c:539)
      [  171.273666] process_backlog (linux-mptcp/./include/linux/rcupdate.h:651 linux-mptcp/net/core/dev.c:6135)
      [  171.275328] net_rx_action (linux-mptcp/net/core/dev.c:6572 linux-mptcp/net/core/dev.c:6640)
      [  171.280472] __do_softirq (linux-mptcp/./arch/x86/include/asm/jump_label.h:25 linux-mptcp/./include/linux/jump_label.h:200 linux-mptcp/./include/trace/events/irq.h:142 linux-mptcp/kernel/softirq.c:293)
      [  171.281379] do_softirq_own_stack (linux-mptcp/arch/x86/entry/entry_64.S:1083)
      [  171.282358]  </IRQ>
      
      We could address the issue clearing explicitly the relevant fields
      in several places - tcp_parse_option, tcp_fast_parse_options,
      possibly others.
      
      Instead we move the MPTCP option parsing into the already existing
      mptcp ingress hook, so that we need to clear the fields in a single
      place.
      
      This allows us dropping an MPTCP hook from the TCP code and
      removing the quite large mptcp_options_received from the tcp_sock
      struct. On the flip side, the MPTCP sockets will traverse the
      option space twice (in tcp_parse_option() and in
      mptcp_incoming_options(). That looks acceptable: we already
      do that for syn and 3rd ack packets, plain TCP socket will
      benefit from it, and even MPTCP sockets will experience better
      code locality, reducing the jumps between TCP and MPTCP code.
      
      v1 -> v2:
       - rebased on current '-net' tree
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfde141e
    • P
      mptcp: consolidate synack processing. · 263e1201
      Paolo Abeni 提交于
      Currently the MPTCP code uses 2 hooks to process syn-ack
      packets, mptcp_rcv_synsent() and the sk_rx_dst_set()
      callback.
      
      We can drop the first, moving the relevant code into the
      latter, reducing the hooking into the TCP code. This is
      also needed by the next patch.
      
      v1 -> v2:
       - use local tcp sock ptr instead of casting the sk variable
         several times - DaveM
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      263e1201
  19. 23 4月, 2020 1 次提交
    • P
      mptcp: fix data_fin handing in RX path · 9a19371b
      Paolo Abeni 提交于
      The data fin flag is set only via a DSS option, but
      mptcp_incoming_options() copies it unconditionally from the
      provided RX options.
      
      Since we do not clear all the mptcp sock RX options in a
      socket free/alloc cycle, we can end-up with a stray data_fin
      value while parsing e.g. MPC packets.
      
      That would lead to mapping data corruption and will trigger
      a few WARN_ON() in the RX path.
      
      Instead of adding a costly memset(), fetch the data_fin flag
      only for DSS packets - when we always explicitly initialize
      such bit at option parsing time.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a19371b
  20. 04 4月, 2020 1 次提交
    • G
      mptcp: add some missing pr_fmt defines · c85adced
      Geliang Tang 提交于
      Some of the mptcp logs didn't print out the format string:
      
      [  185.651493] DSS
      [  185.651494] data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  185.651494] data_ack=13792750332298763796
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=0 skb=0000000063dc595d
      [  185.651495] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 status=0
      [  185.651495] MPTCP: msk ack_seq=9bbc894565aa2f9a subflow ack_seq=9bbc894565aa2f9a
      [  185.651496] MPTCP: msk=00000000c4b81cfc ssk=000000009743af53 data_avail=1 skb=0000000012e809e1
      
      So this patch added these missing pr_fmt defines. Then we can get the same
      format string "MPTCP" in all mptcp logs like this:
      
      [  142.795829] MPTCP: DSS
      [  142.795829] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1
      [  142.795829] MPTCP: data_ack=8089704603109242421
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=0 skb=00000000d5f230df
      [  142.795830] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 status=0
      [  142.795831] MPTCP: msk ack_seq=66790290f1199d9b subflow ack_seq=66790290f1199d9b
      [  142.795831] MPTCP: msk=00000000133a24e0 ssk=000000002e508c64 data_avail=1 skb=00000000de5aca2e
      Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c85adced
  21. 30 3月, 2020 2 次提交