1. 17 11月, 2020 9 次提交
    • F
      mptcp: rework poll+nospace handling · 8edf0864
      Florian Westphal 提交于
      MPTCP maintains a status bit, MPTCP_SEND_SPACE, that is set when at
      least one subflow and the mptcp socket itself are writeable.
      
      mptcp_poll returns EPOLLOUT if the bit is set.
      
      mptcp_sendmsg makes sure MPTCP_SEND_SPACE gets cleared when last write
      has used up all subflows or the mptcp socket wmem.
      
      This reworks nospace handling as follows:
      
      MPTCP_SEND_SPACE is replaced with MPTCP_NOSPACE, i.e. inverted meaning.
      This bit is set when the mptcp socket is not writeable.
      The mptcp-level ack path schedule will then schedule the mptcp worker
      to allow it to free already-acked data (and reduce wmem usage).
      
      This will then wake userspace processes that wait for a POLLOUT event.
      
      sendmsg will set MPTCP_NOSPACE only when it has to wait for more
      wmem (blocking I/O case).
      
      poll path will set MPTCP_NOSPACE in case the mptcp socket is
      not writeable.
      
      Normal tcp-level notification (SOCK_NOSPACE) is only enabled
      in case the subflow socket has no available wmem.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8edf0864
    • P
      mptcp: try to push pending data on snd una updates · 813e0a68
      Paolo Abeni 提交于
      After the previous patch we may end-up with unsent data
      in the write buffer. If such buffer is full, the writer
      will block for unlimited time.
      
      We need to trigger the MPTCP xmit path even for the
      subflow rx path, on MPTCP snd_una updates.
      
      Keep things simple and just schedule the work queue if
      needed.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      813e0a68
    • P
      mptcp: move page frag allocation in mptcp_sendmsg() · d9ca1de8
      Paolo Abeni 提交于
      mptcp_sendmsg() is refactored so that first it copies
      the data provided from user space into the send queue,
      and then tries to spool the send queue via sendmsg_frag.
      
      There a subtle change in the mptcp level collapsing on
      consecutive data fragment: we now allow that only on unsent
      data.
      
      The latter don't need to deal with msghdr data anymore
      and can be simplified in a relevant way.
      
      snd_nxt and write_seq are now tracked independently.
      
      Overall this allows some relevant cleanup and will
      allow sending pending mptcp data on msk una update in
      later patch.
      Co-developed-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d9ca1de8
    • P
      mptcp: refactor shutdown and close · e16163b6
      Paolo Abeni 提交于
      We must not close the subflows before all the MPTCP level
      data, comprising the DATA_FIN has been acked at the MPTCP
      level, otherwise we could be unable to retransmit as needed.
      
      __mptcp_wr_shutdown() shutdown is responsible to check for the
      correct status and close all subflows. Is called by the output
      path after spooling any data and at shutdown/close time.
      
      In a similar way, __mptcp_destroy_sock() is responsible to clean-up
      the MPTCP level status, and is called when the msk transition
      to TCP_CLOSE.
      
      The protocol level close() does not force anymore the TCP_CLOSE
      status, but orphan the msk socket and all the subflows.
      Orphaned msk sockets are forciby closed after a timeout or
      when all MPTCP-level data is acked.
      
      There is a caveat about keeping the orphaned subflows around:
      the TCP stack can asynchronusly call tcp_cleanup_ulp() on them via
      tcp_close(). To prevent accessing freed memory on later MPTCP
      level operations, the msk acquires a reference to each subflow
      socket and prevent subflow_ulp_release() from releasing the
      subflow context before __mptcp_destroy_sock().
      
      The additional subflow references are released by __mptcp_done()
      and the async ULP release is detected checking ULP ops. If such
      field has been already cleared by the ULP release path, the
      dangling context is freed directly by __mptcp_done().
      Co-developed-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e16163b6
    • P
      mptcp: introduce MPTCP snd_nxt · eaa2ffab
      Paolo Abeni 提交于
      Track the next MPTCP sequence number used on xmit,
      currently always equal to write_next.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      eaa2ffab
    • P
      mptcp: add accounting for pending data · f0e6a4cf
      Paolo Abeni 提交于
      Preparation patch to track the data pending in the msk
      write queue. No functional change introduced here
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f0e6a4cf
    • P
      mptcp: reduce the arguments of mptcp_sendmsg_frag · caf971df
      Paolo Abeni 提交于
      The current argument list is pretty long and quite unreadable,
      move many of them into a specific struct. Later patches
      will add more stuff to such struct.
      
      Additionally drop the 'timeo' argument, now unused.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      caf971df
    • P
      mptcp: introduce mptcp_schedule_work · ba8f48f7
      Paolo Abeni 提交于
      remove some of code duplications an allow preventing
      rescheduling on close.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      ba8f48f7
    • P
      mptcp: use tcp_build_frag() · e2223995
      Paolo Abeni 提交于
      mptcp_push_pending() is called even on orphaned
      msk (and orphaned subflows), if there is outstanding
      data at close() time.
      
      To cope with the above MPTCP needs to handle explicitly
      the allocation failure on xmit. The newly introduced
      do_tcp_sendfrag() allows that, just plug it.
      
      We can additionally drop a couple of sanity checks,
      duplicate in the TCP code.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e2223995
  2. 10 11月, 2020 1 次提交
  3. 05 11月, 2020 4 次提交
  4. 30 10月, 2020 1 次提交
  5. 11 10月, 2020 1 次提交
  6. 09 10月, 2020 1 次提交
    • P
      mptcp: fix infinite loop on recvmsg()/worker() race. · d9fb8c50
      Paolo Abeni 提交于
      If recvmsg() and the workqueue race to dequeue the data
      pending on some subflow, the current mapping for such
      subflow covers several skbs and some of them have not
      reached yet the received, either the worker or recvmsg()
      can find a subflow with the data_avail flag set - since
      the current mapping is valid and in sequence - but no
      skbs in the receive queue - since the other entity just
      processed them.
      
      The above will lead to an unbounded loop in __mptcp_move_skbs()
      and a subsequent hang of any task trying to acquiring the msk
      socket lock.
      
      This change addresses the issue stopping the __mptcp_move_skbs()
      loop as soon as we detect the above race (empty receive queue
      with data_avail set).
      
      Reported-and-tested-by: syzbot+fcf8ca5817d6e92c6567@syzkaller.appspotmail.com
      Fixes: ab174ad8 ("mptcp: move ooo skbs into msk out of order queue.")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: NMat Martineau <mathew.j.martineau@linux.intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d9fb8c50
  7. 06 10月, 2020 1 次提交
  8. 30 9月, 2020 1 次提交
  9. 25 9月, 2020 3 次提交
  10. 18 9月, 2020 1 次提交
  11. 15 9月, 2020 8 次提交
  12. 01 9月, 2020 1 次提交
  13. 27 8月, 2020 1 次提交
  14. 24 8月, 2020 1 次提交
  15. 17 8月, 2020 1 次提交
  16. 15 8月, 2020 1 次提交
  17. 04 8月, 2020 2 次提交
  18. 29 7月, 2020 2 次提交