1. 28 10月, 2021 1 次提交
    • P
      mptcp: allocate fwd memory separately on the rx and tx path · 6511882c
      Paolo Abeni 提交于
      All the mptcp receive path is protected by the msk socket
      spinlock. As consequences, the tx path has to play a few tricks to
      allocate the forward memory without acquiring the spinlock multiple
      times, making the overall TX path quite complex.
      
      This patch tries to clean-up a bit the tx path, using completely
      separated fwd memory allocation, for the rx and the tx path.
      
      The forward memory allocated in the rx path is now accounted in
      msk->rmem_fwd_alloc and is (still) protected by the msk socket spinlock.
      
      To cope with the above we provide a few MPTCP-specific variants for
      the helpers to charge, uncharge, reclaim and free the forward memory
      in the receive path.
      
      msk->sk_forward_alloc now accounts only the forward memory for the tx
      path, we can use the plain core sock helper to manipulate it and drop
      quite a bit of complexity.
      
      On memory pressure, both rx and tx fwd memories are reclaimed.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      6511882c
  2. 08 10月, 2021 1 次提交
    • P
      mptcp: fix possible stall on recvmsg() · 612f71d7
      Paolo Abeni 提交于
      recvmsg() can enter an infinite loop if the caller provides the
      MSG_WAITALL, the data present in the receive queue is not sufficient to
      fulfill the request, and no more data is received by the peer.
      
      When the above happens, mptcp_wait_data() will always return with
      no wait, as the MPTCP_DATA_READY flag checked by such function is
      set and never cleared in such code path.
      
      Leveraging the above syzbot was able to trigger an RCU stall:
      
      rcu: INFO: rcu_preempt self-detected stall on CPU
      rcu:    0-...!: (10499 ticks this GP) idle=0af/1/0x4000000000000000 softirq=10678/10678 fqs=1
              (t=10500 jiffies g=13089 q=109)
      rcu: rcu_preempt kthread starved for 10497 jiffies! g13089 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=1
      rcu:    Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior.
      rcu: RCU grace-period kthread stack dump:
      task:rcu_preempt     state:R  running task     stack:28696 pid:   14 ppid:     2 flags:0x00004000
      Call Trace:
       context_switch kernel/sched/core.c:4955 [inline]
       __schedule+0x940/0x26f0 kernel/sched/core.c:6236
       schedule+0xd3/0x270 kernel/sched/core.c:6315
       schedule_timeout+0x14a/0x2a0 kernel/time/timer.c:1881
       rcu_gp_fqs_loop+0x186/0x810 kernel/rcu/tree.c:1955
       rcu_gp_kthread+0x1de/0x320 kernel/rcu/tree.c:2128
       kthread+0x405/0x4f0 kernel/kthread.c:327
       ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      rcu: Stack dump where RCU GP kthread last ran:
      Sending NMI from CPU 0 to CPUs 1:
      NMI backtrace for cpu 1
      CPU: 1 PID: 8510 Comm: syz-executor827 Not tainted 5.15.0-rc2-next-20210920-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:bytes_is_nonzero mm/kasan/generic.c:84 [inline]
      RIP: 0010:memory_is_nonzero mm/kasan/generic.c:102 [inline]
      RIP: 0010:memory_is_poisoned_n mm/kasan/generic.c:128 [inline]
      RIP: 0010:memory_is_poisoned mm/kasan/generic.c:159 [inline]
      RIP: 0010:check_region_inline mm/kasan/generic.c:180 [inline]
      RIP: 0010:kasan_check_range+0xc8/0x180 mm/kasan/generic.c:189
      Code: 38 00 74 ed 48 8d 50 08 eb 09 48 83 c0 01 48 39 d0 74 7a 80 38 00 74 f2 48 89 c2 b8 01 00 00 00 48 85 d2 75 56 5b 5d 41 5c c3 <48> 85 d2 74 5e 48 01 ea eb 09 48 83 c0 01 48 39 d0 74 50 80 38 00
      RSP: 0018:ffffc9000cd676c8 EFLAGS: 00000283
      RAX: ffffed100e9a110e RBX: ffffed100e9a110f RCX: ffffffff88ea062a
      RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff888074d08870
      RBP: ffffed100e9a110e R08: 0000000000000001 R09: ffff888074d08877
      R10: ffffed100e9a110e R11: 0000000000000000 R12: ffff888074d08000
      R13: ffff888074d08000 R14: ffff888074d08088 R15: ffff888074d08000
      FS:  0000555556d8e300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      S:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020000180 CR3: 0000000068909000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
       test_and_clear_bit include/asm-generic/bitops/instrumented-atomic.h:83 [inline]
       mptcp_release_cb+0x14a/0x210 net/mptcp/protocol.c:3016
       release_sock+0xb4/0x1b0 net/core/sock.c:3204
       mptcp_wait_data net/mptcp/protocol.c:1770 [inline]
       mptcp_recvmsg+0xfd1/0x27b0 net/mptcp/protocol.c:2080
       inet6_recvmsg+0x11b/0x5e0 net/ipv6/af_inet6.c:659
       sock_recvmsg_nosec net/socket.c:944 [inline]
       ____sys_recvmsg+0x527/0x600 net/socket.c:2626
       ___sys_recvmsg+0x127/0x200 net/socket.c:2670
       do_recvmmsg+0x24d/0x6d0 net/socket.c:2764
       __sys_recvmmsg net/socket.c:2843 [inline]
       __do_sys_recvmmsg net/socket.c:2866 [inline]
       __se_sys_recvmmsg net/socket.c:2859 [inline]
       __x64_sys_recvmmsg+0x20b/0x260 net/socket.c:2859
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7fc200d2dc39
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 41 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc5758e5a8 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
      RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fc200d2dc39
      RDX: 0000000000000002 RSI: 00000000200017c0 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000f0b5ff
      R10: 0000000000000100 R11: 0000000000000246 R12: 0000000000000003
      R13: 00007ffc5758e5d0 R14: 00007ffc5758e5c0 R15: 0000000000000003
      
      Fix the issue by replacing the MPTCP_DATA_READY bit with direct
      inspection of the msk receive queue.
      
      Reported-and-tested-by: syzbot+3360da629681aa0d22fe@syzkaller.appspotmail.com
      Fixes: 7a6a6cbc ("mptcp: recvmsg() can drain data from multiple subflow")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      612f71d7
  3. 30 9月, 2021 1 次提交
    • P
      net: introduce and use lock_sock_fast_nested() · 49054556
      Paolo Abeni 提交于
      Syzkaller reported a false positive deadlock involving
      the nl socket lock and the subflow socket lock:
      
      MPTCP: kernel_bind error, err=-98
      ============================================
      WARNING: possible recursive locking detected
      5.15.0-rc1-syzkaller #0 Not tainted
      --------------------------------------------
      syz-executor998/6520 is trying to acquire lock:
      ffff8880795718a0 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
      
      but task is already holding lock:
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
      ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(k-sk_lock-AF_INET);
        lock(k-sk_lock-AF_INET);
      
       *** DEADLOCK ***
      
       May be due to missing lock nesting notation
      
      3 locks held by syz-executor998/6520:
       #0: ffffffff8d176c50 (cb_lock){++++}-{3:3}, at: genl_rcv+0x15/0x40 net/netlink/genetlink.c:802
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_lock net/netlink/genetlink.c:33 [inline]
       #1: ffffffff8d176d08 (genl_mutex){+.+.}-{3:3}, at: genl_rcv_msg+0x3e0/0x580 net/netlink/genetlink.c:790
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock include/net/sock.h:1612 [inline]
       #2: ffff8880787c8c60 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: mptcp_close+0x23/0x7b0 net/mptcp/protocol.c:2720
      
      stack backtrace:
      CPU: 1 PID: 6520 Comm: syz-executor998 Not tainted 5.15.0-rc1-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_deadlock_bug kernel/locking/lockdep.c:2944 [inline]
       check_deadlock kernel/locking/lockdep.c:2987 [inline]
       validate_chain kernel/locking/lockdep.c:3776 [inline]
       __lock_acquire.cold+0x149/0x3ab kernel/locking/lockdep.c:5015
       lock_acquire kernel/locking/lockdep.c:5625 [inline]
       lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
       lock_sock_fast+0x36/0x100 net/core/sock.c:3229
       mptcp_close+0x267/0x7b0 net/mptcp/protocol.c:2738
       inet_release+0x12e/0x280 net/ipv4/af_inet.c:431
       __sock_release net/socket.c:649 [inline]
       sock_release+0x87/0x1b0 net/socket.c:677
       mptcp_pm_nl_create_listen_socket+0x238/0x2c0 net/mptcp/pm_netlink.c:900
       mptcp_nl_cmd_add_addr+0x359/0x930 net/mptcp/pm_netlink.c:1170
       genl_family_rcv_msg_doit+0x228/0x320 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x328/0x580 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2504
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x86d/0xdb0 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_no_sendpage+0x101/0x150 net/core/sock.c:2980
       kernel_sendpage.part.0+0x1a0/0x340 net/socket.c:3504
       kernel_sendpage net/socket.c:3501 [inline]
       sock_sendpage+0xe5/0x140 net/socket.c:1003
       pipe_to_sendpage+0x2ad/0x380 fs/splice.c:364
       splice_from_pipe_feed fs/splice.c:418 [inline]
       __splice_from_pipe+0x43e/0x8a0 fs/splice.c:562
       splice_from_pipe fs/splice.c:597 [inline]
       generic_splice_sendpage+0xd4/0x140 fs/splice.c:746
       do_splice_from fs/splice.c:767 [inline]
       direct_splice_actor+0x110/0x180 fs/splice.c:936
       splice_direct_to_actor+0x34b/0x8c0 fs/splice.c:891
       do_splice_direct+0x1b3/0x280 fs/splice.c:979
       do_sendfile+0xae9/0x1240 fs/read_write.c:1249
       __do_sys_sendfile64 fs/read_write.c:1314 [inline]
       __se_sys_sendfile64 fs/read_write.c:1300 [inline]
       __x64_sys_sendfile64+0x1cc/0x210 fs/read_write.c:1300
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x7f215cb69969
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 14 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007ffc96bb3868 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
      RAX: ffffffffffffffda RBX: 00007f215cbad072 RCX: 00007f215cb69969
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000005
      RBP: 0000000000000000 R08: 00007ffc96bb3a08 R09: 00007ffc96bb3a08
      R10: 0000000100000002 R11: 0000000000000246 R12: 00007ffc96bb387c
      R13: 431bde82d7b634db R14: 0000000000000000 R15: 0000000000000000
      
      the problem originates from uncorrect lock annotation in the mptcp
      code and is only visible since commit 2dcb96ba ("net: core: Correct
      the sock::sk_lock.owned lockdep annotations"), but is present since
      the port-based endpoint support initial implementation.
      
      This patch addresses the issue introducing a nested variant of
      lock_sock_fast() and using it in the relevant code path.
      
      Fixes: 1729cf18 ("mptcp: create the listening socket for new port")
      Fixes: 2dcb96ba ("net: core: Correct the sock::sk_lock.owned lockdep annotations")
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-and-tested-by: syzbot+1dd53f7a89b299d59eaf@syzkaller.appspotmail.com
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49054556
  4. 25 9月, 2021 4 次提交
  5. 23 9月, 2021 2 次提交
    • P
      mptcp: stop relying on tcp_tx_skb_cache · f70cad10
      Paolo Abeni 提交于
      We want to revert the skb TX cache, but MPTCP is currently
      using it unconditionally.
      
      Rework the MPTCP tx code, so that tcp_tx_skb_cache is not
      needed anymore: do the whole coalescing check, skb allocation
      skb initialization/update inside mptcp_sendmsg_frag(), quite
      alike the current TCP code.
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f70cad10
    • P
      mptcp: ensure tx skbs always have the MPTCP ext · efe686ff
      Paolo Abeni 提交于
      Due to signed/unsigned comparison, the expression:
      
      	info->size_goal - skb->len > 0
      
      evaluates to true when the size goal is smaller than the
      skb size. That results in lack of tx cache refill, so that
      the skb allocated by the core TCP code lacks the required
      MPTCP skb extensions.
      
      Due to the above, syzbot is able to trigger the following WARN_ON():
      
      WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Modules linked in:
      CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
      RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
      RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
      RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
      RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
      R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
      FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
       mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
       release_sock+0xb4/0x1b0 net/core/sock.c:3206
       sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
       mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
       inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2163 [inline]
       new_sync_write+0x40b/0x640 fs/read_write.c:507
       vfs_write+0x7cf/0xae0 fs/read_write.c:594
       ksys_write+0x1ee/0x250 fs/read_write.c:647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
      RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
      R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
      
      Fix the issue rewriting the relevant expression to avoid
      sign-related problems - note: size_goal is always >= 0.
      
      Additionally, ensure that the skb in the tx cache always carries
      the relevant extension.
      
      Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
      Fixes: 1094c6fe ("mptcp: fix possible divide by zero")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe686ff
  6. 22 9月, 2021 1 次提交
    • P
      mptcp: ensure tx skbs always have the MPTCP ext · 977d293e
      Paolo Abeni 提交于
      Due to signed/unsigned comparison, the expression:
      
      	info->size_goal - skb->len > 0
      
      evaluates to true when the size goal is smaller than the
      skb size. That results in lack of tx cache refill, so that
      the skb allocated by the core TCP code lacks the required
      MPTCP skb extensions.
      
      Due to the above, syzbot is able to trigger the following WARN_ON():
      
      WARNING: CPU: 1 PID: 810 at net/mptcp/protocol.c:1366 mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Modules linked in:
      CPU: 1 PID: 810 Comm: syz-executor.4 Not tainted 5.14.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:mptcp_sendmsg_frag+0x1362/0x1bc0 net/mptcp/protocol.c:1366
      Code: ff 4c 8b 74 24 50 48 8b 5c 24 58 e9 0f fb ff ff e8 13 44 8b f8 4c 89 e7 45 31 ed e8 98 57 2e fe e9 81 f4 ff ff e8 fe 43 8b f8 <0f> 0b 41 bd ea ff ff ff e9 6f f4 ff ff 4c 89 e7 e8 b9 8e d2 f8 e9
      RSP: 0018:ffffc9000531f6a0 EFLAGS: 00010216
      RAX: 000000000000697f RBX: 0000000000000000 RCX: ffffc90012107000
      RDX: 0000000000040000 RSI: ffffffff88eac9e2 RDI: 0000000000000003
      RBP: ffff888078b15780 R08: 0000000000000000 R09: 0000000000000000
      R10: ffffffff88eac017 R11: 0000000000000000 R12: ffff88801de0a280
      R13: 0000000000006b58 R14: ffff888066278280 R15: ffff88803c2fe9c0
      FS:  00007fd9f866e700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007faebcb2f718 CR3: 00000000267cb000 CR4: 00000000001506e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       __mptcp_push_pending+0x1fb/0x6b0 net/mptcp/protocol.c:1547
       mptcp_release_cb+0xfe/0x210 net/mptcp/protocol.c:3003
       release_sock+0xb4/0x1b0 net/core/sock.c:3206
       sk_stream_wait_memory+0x604/0xed0 net/core/stream.c:145
       mptcp_sendmsg+0xc39/0x1bc0 net/mptcp/protocol.c:1749
       inet6_sendmsg+0x99/0xe0 net/ipv6/af_inet6.c:643
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x2a0/0x3e0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2163 [inline]
       new_sync_write+0x40b/0x640 fs/read_write.c:507
       vfs_write+0x7cf/0xae0 fs/read_write.c:594
       ksys_write+0x1ee/0x250 fs/read_write.c:647
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      RIP: 0033:0x4665f9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fd9f866e188 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000000000056c038 RCX: 00000000004665f9
      RDX: 00000000000e7b78 RSI: 0000000020000000 RDI: 0000000000000003
      RBP: 00000000004bfcc4 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056c038
      R13: 0000000000a9fb1f R14: 00007fd9f866e300 R15: 0000000000022000
      
      Fix the issue rewriting the relevant expression to avoid
      sign-related problems - note: size_goal is always >= 0.
      
      Additionally, ensure that the skb in the tx cache always carries
      the relevant extension.
      
      Reported-and-tested-by: syzbot+263a248eec3e875baa7b@syzkaller.appspotmail.com
      Fixes: 1094c6fe ("mptcp: fix possible divide by zero")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      977d293e
  7. 03 9月, 2021 1 次提交
    • M
      mptcp: Only send extra TCP acks in eligible socket states · 340fa666
      Mat Martineau 提交于
      Recent changes exposed a bug where specifically-timed requests to the
      path manager netlink API could trigger a divide-by-zero in
      __tcp_select_window(), as syzkaller does:
      
      divide error: 0000 [#1] SMP KASAN NOPTI
      CPU: 0 PID: 9667 Comm: syz-executor.0 Not tainted 5.14.0-rc6+ #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      RIP: 0010:__tcp_select_window+0x509/0xa60 net/ipv4/tcp_output.c:3016
      Code: 44 89 ff e8 c9 29 e9 fd 45 39 e7 0f 8d 20 ff ff ff e8 db 28 e9 fd 44 89 e3 e9 13 ff ff ff e8 ce 28 e9 fd 44 89 e0 44 89 e3 99 <f7> 7c 24 04 29 d3 e9 fc fe ff ff e8 b7 28 e9 fd 44 89 f1 48 89 ea
      RSP: 0018:ffff888031ccf020 EFLAGS: 00010216
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000040000
      RDX: 0000000000000000 RSI: ffff88811532c080 RDI: 0000000000000002
      RBP: 0000000000000000 R08: ffffffff835807c2 R09: 0000000000000000
      R10: 0000000000000004 R11: ffffed1020b92441 R12: 0000000000000000
      R13: 1ffff11006399e08 R14: 0000000000000000 R15: 0000000000000000
      FS:  00007fa4c8344700(0000) GS:ffff88811ae00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2f424000 CR3: 000000003e4e2003 CR4: 0000000000770ef0
      PKRU: 55555554
      Call Trace:
       tcp_select_window net/ipv4/tcp_output.c:264 [inline]
       __tcp_transmit_skb+0xc00/0x37a0 net/ipv4/tcp_output.c:1351
       __tcp_send_ack.part.0+0x3ec/0x760 net/ipv4/tcp_output.c:3972
       __tcp_send_ack net/ipv4/tcp_output.c:3978 [inline]
       tcp_send_ack+0x7d/0xa0 net/ipv4/tcp_output.c:3978
       mptcp_pm_nl_addr_send_ack+0x1ab/0x380 net/mptcp/pm_netlink.c:654
       mptcp_pm_remove_addr+0x161/0x200 net/mptcp/pm.c:58
       mptcp_nl_remove_id_zero_address+0x197/0x460 net/mptcp/pm_netlink.c:1328
       mptcp_nl_cmd_del_addr+0x98b/0xd40 net/mptcp/pm_netlink.c:1359
       genl_family_rcv_msg_doit.isra.0+0x225/0x340 net/netlink/genetlink.c:731
       genl_family_rcv_msg net/netlink/genetlink.c:775 [inline]
       genl_rcv_msg+0x341/0x5b0 net/netlink/genetlink.c:792
       netlink_rcv_skb+0x148/0x430 net/netlink/af_netlink.c:2504
       genl_rcv+0x24/0x40 net/netlink/genetlink.c:803
       netlink_unicast_kernel net/netlink/af_netlink.c:1314 [inline]
       netlink_unicast+0x537/0x750 net/netlink/af_netlink.c:1340
       netlink_sendmsg+0x846/0xd80 net/netlink/af_netlink.c:1929
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0x14e/0x190 net/socket.c:724
       ____sys_sendmsg+0x709/0x870 net/socket.c:2403
       ___sys_sendmsg+0xff/0x170 net/socket.c:2457
       __sys_sendmsg+0xe5/0x1b0 net/socket.c:2486
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      mptcp_pm_nl_addr_send_ack() was attempting to send a TCP ACK on the
      first subflow in the MPTCP socket's connection list without validating
      that the subflow was in a suitable connection state. To address this,
      always validate subflow state when sending extra ACKs on subflows
      for address advertisement or subflow priority change.
      
      Fixes: 84dfe367 ("mptcp: send out dedicated ADD_ADDR packet")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/229Co-developed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Acked-by: NGeliang Tang <geliangtang@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      340fa666
  8. 02 9月, 2021 1 次提交
    • P
      mptcp: fix possible divide by zero · 1094c6fe
      Paolo Abeni 提交于
      Florian noted that if mptcp_alloc_tx_skb() allocation fails
      in __mptcp_push_pending(), we can end-up invoking
      mptcp_push_release()/tcp_push() with a zero mss, causing
      a divide by 0 error.
      
      This change addresses the issue refactoring the skb allocation
      code checking if skb collapsing will happen for sure and doing
      the skb allocation only after such check. Skb allocation will
      now happen only after the call to tcp_send_mss() which
      correctly initializes mss_now.
      
      As side bonuses we now fill the skb tx cache only when needed,
      and this also clean-up a bit the output path.
      
      v1 -> v2:
       - use lockdep_assert_held_once() - Jakub
       - fix indentation - Jakub
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Fixes: 724cfd2e ("mptcp: allocate TX skbs in msk context")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      1094c6fe
  9. 27 8月, 2021 2 次提交
  10. 14 8月, 2021 5 次提交
  11. 10 7月, 2021 1 次提交
  12. 29 6月, 2021 1 次提交
  13. 23 6月, 2021 6 次提交
  14. 19 6月, 2021 4 次提交
  15. 11 6月, 2021 3 次提交
    • P
      mptcp: fix soft lookup in subflow_error_report() · 499ada50
      Paolo Abeni 提交于
      Maxim reported a soft lookup in subflow_error_report():
      
       watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0]
       RIP: 0010:native_queued_spin_lock_slowpath
       RSP: 0018:ffffa859c0003bc0 EFLAGS: 00000202
       RAX: 0000000000000101 RBX: 0000000000000001 RCX: 0000000000000000
       RDX: ffff9195c2772d88 RSI: 0000000000000000 RDI: ffff9195c2772d88
       RBP: ffff9195c2772d00 R08: 00000000000067b0 R09: c6e31da9eb1e44f4
       R10: ffff9195ef379700 R11: ffff9195edb50710 R12: ffff9195c2772d88
       R13: ffff9195f500e3d0 R14: ffff9195ef379700 R15: ffff9195ef379700
       FS:  0000000000000000(0000) GS:ffff91961f400000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 000000c000407000 CR3: 0000000002988000 CR4: 00000000000006f0
       Call Trace:
        <IRQ>
       _raw_spin_lock_bh
       subflow_error_report
       mptcp_subflow_data_available
       __mptcp_move_skbs_from_subflow
       mptcp_data_ready
       tcp_data_queue
       tcp_rcv_established
       tcp_v4_do_rcv
       tcp_v4_rcv
       ip_protocol_deliver_rcu
       ip_local_deliver_finish
       __netif_receive_skb_one_core
       netif_receive_skb
       rtl8139_poll 8139too
       __napi_poll
       net_rx_action
       __do_softirq
       __irq_exit_rcu
       common_interrupt
        </IRQ>
      
      The calling function - mptcp_subflow_data_available() - can be invoked
      from different contexts:
      - plain ssk socket lock
      - ssk socket lock + mptcp_data_lock
      - ssk socket lock + mptcp_data_lock + msk socket lock.
      
      Since subflow_error_report() tries to acquire the mptcp_data_lock, the
      latter two call chains will cause soft lookup.
      
      This change addresses the issue moving the error reporting call to
      outer functions, where the held locks list is known and the we can
      acquire only the needed one.
      Reported-by: NMaxim Galaganov <max@internet.ru>
      Fixes: 15cc1045 ("mptcp: deliver ssk errors to msk")
      Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/199Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      499ada50
    • P
      mptcp: wake-up readers only for in sequence data · 99d1055c
      Paolo Abeni 提交于
      Currently we rely on the subflow->data_avail field, which is subject to
      races:
      
      	ssk1
      		skb len = 500 DSS(seq=1, len=1000, off=0)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk2
      		skb len = 500 DSS(seq = 501, len=1000)
      		# data_avail == MPTCP_SUBFLOW_DATA_AVAIL
      
      	ssk1
      		skb len = 500 DSS(seq = 1, len=1000, off =500)
      		# still data_avail == MPTCP_SUBFLOW_DATA_AVAIL,
      		# as the skb is covered by a pre-existing map,
      		# which was in-sequence at reception time.
      
      Instead we can explicitly check if some has been received in-sequence,
      propagating the info from __mptcp_move_skbs_from_subflow().
      
      Additionally add the 'ONCE' annotation to the 'data_avail' memory
      access, as msk will read it outside the subflow socket lock.
      
      Fixes: 648ef4b8 ("mptcp: Implement MPTCP receive path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99d1055c
    • P
      mptcp: try harder to borrow memory from subflow under pressure · 72f96132
      Paolo Abeni 提交于
      If the host is under sever memory pressure, and RX forward
      memory allocation for the msk fails, we try to borrow the
      required memory from the ingress subflow.
      
      The current attempt is a bit flaky: if skb->truesize is less
      than SK_MEM_QUANTUM, the ssk will not release any memory, and
      the next schedule will fail again.
      
      Instead, directly move the required amount of pages from the
      ssk to the msk, if available
      
      Fixes: 9c3f94e1 ("mptcp: add missing memory scheduling in the rx path")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72f96132
  16. 05 6月, 2021 1 次提交
  17. 29 5月, 2021 1 次提交
  18. 26 5月, 2021 1 次提交
  19. 12 5月, 2021 1 次提交
  20. 24 4月, 2021 2 次提交