1. 04 10月, 2017 2 次提交
    • M
      sctp: introduce stream scheduler foundations · 5bbbbe32
      Marcelo Ricardo Leitner 提交于
      This patch introduces the hooks necessary to do stream scheduling, as
      per RFC Draft ndata.  It also introduces the first scheduler, which is
      what we do today but now factored out: first come first served (FCFS).
      
      With stream scheduling now we have to track which chunk was enqueued on
      which stream and be able to select another other than the in front of
      the main outqueue. So we introduce a list on sctp_stream_out_ext
      structure for this purpose.
      
      We reuse sctp_chunk->transmitted_list space for the list above, as the
      chunk cannot belong to the two lists at the same time. By using the
      union in there, we can have distinct names for these moments.
      
      sctp_sched_ops are the operations expected to be implemented by each
      scheduler. The dequeueing is a bit particular to this implementation but
      it is to match how we dequeue packets today. We first dequeue and then
      check if it fits the packet and if not, we requeue it at head. Thus why
      we don't have a peek operation but have dequeue_done instead, which is
      called once the chunk can be safely considered as transmitted.
      
      The check removed from sctp_outq_flush is now performed by
      sctp_stream_outq_migrate, which is only called during assoc setup.
      (sctp_sendmsg() also checks for it)
      
      The only operation that is foreseen but not yet added here is a way to
      signalize that a new packet is starting or that the packet is done, for
      round robin scheduler per packet, but is intentionally left to the
      patch that actually implements it.
      
      Support for I-DATA chunks, also described in this RFC, with user message
      interleaving is straightforward as it just requires the schedulers to
      probe for the feature and ignore datamsg boundaries when dequeueing.
      
      See-also: https://tools.ietf.org/html/draft-ietf-tsvwg-sctp-ndata-13Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bbbbe32
    • M
      sctp: introduce struct sctp_stream_out_ext · f952be79
      Marcelo Ricardo Leitner 提交于
      With the stream schedulers, sctp_stream_out will become too big to be
      allocated by kmalloc and as we need to allocate with BH disabled, we
      cannot use __vmalloc in sctp_stream_init().
      
      This patch moves out the stats from sctp_stream_out to
      sctp_stream_out_ext, which will be allocated only when the application
      tries to sendmsg something on it.
      
      Just the introduction of sctp_stream_out_ext would already fix the issue
      described above by splitting the allocation in two. Moving the stats
      to it also reduces the pressure on the allocator as we will ask for less
      memory atomically when creating the socket and we will use GFP_KERNEL
      later.
      
      Then, for stream schedulers, we will just use sctp_stream_out_ext.
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f952be79
  2. 07 8月, 2017 2 次提交
  3. 25 7月, 2017 1 次提交
  4. 01 7月, 2017 1 次提交
  5. 03 6月, 2017 1 次提交
  6. 04 4月, 2017 1 次提交
  7. 02 4月, 2017 1 次提交
  8. 22 3月, 2017 1 次提交
  9. 08 2月, 2017 1 次提交
  10. 19 1月, 2017 1 次提交
  11. 11 1月, 2017 1 次提交
  12. 13 10月, 2016 1 次提交
  13. 30 9月, 2016 2 次提交
    • X
      sctp: change to check peer prsctp_capable when using prsctp polices · be4947bf
      Xin Long 提交于
      Now before using prsctp polices, sctp uses asoc->prsctp_enable to
      check if prsctp is enabled. However asoc->prsctp_enable is set only
      means local host support prsctp, sctp should not abandon packet if
      peer host doesn't enable prsctp.
      
      So this patch is to use asoc->peer.prsctp_capable to check if prsctp
      is enabled on both side, instead of asoc->prsctp_enable, as asoc's
      peer.prsctp_capable is set only when local and peer both enable prsctp.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be4947bf
    • X
      sctp: remove prsctp_param from sctp_chunk · 0605483f
      Xin Long 提交于
      Now sctp uses chunk->prsctp_param to save the prsctp param for all the
      prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
      We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
      and reuse msg->expires_at for TTL policy, as the prsctp polices and old
      expires policy are mutual exclusive.
      
      This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
      expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
      polices.
      
      Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
      as it needs a u64 variables to save the expires_at time.
      
      This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
      issue.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0605483f
  14. 23 9月, 2016 1 次提交
  15. 19 9月, 2016 4 次提交
    • X
      sctp: make sctp_outq_flush/tail/uncork return void · 83dbc3d4
      Xin Long 提交于
      sctp_outq_flush return value is meaningless now, this patch is
      to make sctp_outq_flush return void, as well as sctp_outq_fail
      and sctp_outq_uncork.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      83dbc3d4
    • X
      sctp: save transmit error to sk_err in sctp_outq_flush · 64519440
      Xin Long 提交于
      Every time when sctp calls sctp_outq_flush, it sends out the chunks of
      control queue, retransmit queue and data queue. Even if some trunks are
      failed to transmit, it still has to flush all the transports, as it's
      the only chance to clean that transmit_list.
      
      So the latest transmit error here should be returned back. This transmit
      error is an internal error of sctp stack.
      
      I checked all the places where it uses the transmit error (the return
      value of sctp_outq_flush), most of them are actually just save it to
      sk_err.
      
      Except for sctp_assoc/endpoint_bh_rcv, they will drop the chunk if
      it's failed to send a REPLY, which is actually incorrect, as we can't
      be sure the error that sctp_outq_flush returns is from sending that
      REPLY.
      
      So it's meaningless for sctp_outq_flush to return error back.
      
      This patch is to save transmit error to sk_err in sctp_outq_flush, the
      new error can update the old value. Eventually, sctp_wait_for_* would
      check for it.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64519440
    • X
      sctp: free msg->chunks when sctp_primitive_SEND return err · b61c654f
      Xin Long 提交于
      Last patch "sctp: do not return the transmit err back to sctp_sendmsg"
      made sctp_primitive_SEND return err only when asoc state is unavailable.
      In this case, chunks are not enqueued, they have no chance to be freed if
      we don't take care of them later.
      
      This Patch is actually to revert commit 1cd4d5c4 ("sctp: remove the
      unused sctp_datamsg_free()"), commit 69b5777f ("sctp: hold the chunks
      only after the chunk is enqueued in outq") and commit 8b570dc9 ("sctp:
      only drop the reference on the datamsg after sending a msg"), to use
      sctp_datamsg_free to free the chunks of current msg.
      
      Fixes: 8b570dc9 ("sctp: only drop the reference on the datamsg after sending a msg")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b61c654f
    • X
      sctp: remove the unnecessary state check in sctp_outq_tail · 2c89791e
      Xin Long 提交于
      Data Chunks are only sent by sctp_primitive_SEND, in which sctp checks
      the asoc's state through statetable before calling sctp_outq_tail. So
      there's no need to check the asoc's state again in sctp_outq_tail.
      
      Besides, sctp_do_sm is protected by lock_sock, even if sending msg is
      interrupted by timer events, the event's processes still need to acquire
      lock_sock first. It means no others CMDs can be enqueue into side effect
      list before CMD_SEND_MSG to change asoc->state, so it's safe to remove it.
      
      This patch is to remove redundant asoc->state check from sctp_outq_tail.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c89791e
  16. 12 7月, 2016 1 次提交
    • X
      sctp: implement prsctp PRIO policy · 8dbdf1f5
      Xin Long 提交于
      prsctp PRIO policy is a policy to abandon lower priority chunks when
      asoc doesn't have enough snd buffer, so that the current chunk with
      higher priority can be queued successfully.
      
      Similar to TTL/RTX policy, we will set the priority of the chunk to
      prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
      So if PRIO policy is enabled, msg->expire_at won't work.
      
      asoc->sent_cnt_removable will record how many chunks can be checked to
      remove. If priority policy is enabled, when the chunk is queued into
      the out_queue, we will increase sent_cnt_removable. When the chunk is
      moved to abandon_queue or dequeue and free, we will decrease
      sent_cnt_removable.
      
      In sctp_sendmsg, we will check if there is enough snd buffer for current
      msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
      sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
      free chunks from out_queue in right order until the abandon+free size >
      msg_len - sctp_wfree. For the abandon size, we have to wait until it
      sends FORWARD TSN, receives the sack and the chunks are really freed.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dbdf1f5
  17. 11 4月, 2016 1 次提交
    • M
      sctp: avoid refreshing heartbeat timer too often · ba6f5e33
      Marcelo Ricardo Leitner 提交于
      Currently on high rate SCTP streams the heartbeat timer refresh can
      consume quite a lot of resources as timer updates are costly and it
      contains a random factor, which a) is also costly and b) invalidates
      mod_timer() optimization for not editing a timer to the same value.
      It may even cause the timer to be slightly advanced, for no good reason.
      
      As suggested by David Laight this patch now removes this timer update
      from hot path by leaving the timer on and re-evaluating upon its
      expiration if the heartbeat is still needed or not, similarly to what is
      done for TCP. If it's not needed anymore the timer is re-scheduled to
      the new timeout, considering the time already elapsed.
      
      For this, we now record the last tx timestamp per transport, updated in
      the same spots as hb timer was restarted on tx. Also split up
      sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
      sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
      heartbeat one.
      
      On loopback with MTU of 65535 and data chunks with 1636, so that we
      have a considerable amount of chunks without stressing system calls,
      netperf -t SCTP_STREAM -l 30, perf looked like this before:
      
      Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
        Overhead  Command  Shared Object      Symbol
      +    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
         - _raw_write_unlock_irqrestore
            - 96,54% _raw_spin_unlock_irqrestore
               - 36,14% mod_timer
                  + 97,24% sctp_transport_reset_timers
                  + 2,76% sctp_do_sm
               + 33,65% __wake_up_sync_key
               + 28,77% sctp_ulpq_tail_event
               + 1,40% del_timer
            - 1,84% mod_timer
               + 99,03% sctp_transport_reset_timers
               + 0,97% sctp_do_sm
            + 1,50% sctp_ulpq_tail_event
      
      And after this patch, now with netperf -l 60:
      
      Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
        Overhead  Command  Shared Object      Symbol
      +    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
      +    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            + 49,89% __wake_up_sync_key
            + 45,68% sctp_ulpq_tail_event
            - 2,85% mod_timer
               + 76,51% sctp_transport_reset_t3_rtx
               + 23,49% sctp_do_sm
            + 1,55% del_timer
      +    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
      +    2,26%  netperf  [sctp]             [k] sctp_sendmsg
      
      Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
      ~3.7%.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba6f5e33
  18. 21 3月, 2016 1 次提交
  19. 14 3月, 2016 1 次提交
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
  20. 07 12月, 2015 2 次提交
  21. 23 7月, 2014 1 次提交
  22. 16 7月, 2014 1 次提交
  23. 03 1月, 2014 1 次提交
    • V
      sctp: Remove outqueue empty state · 619a60ee
      Vlad Yasevich 提交于
      The SCTP outqueue structure maintains a data chunks
      that are pending transmission, the list of chunks that
      are pending a retransmission and a length of data in
      flight.  It also tries to keep the emtpy state so that
      it can performe shutdown sequence or notify user.
      
      The problem is that the empy state is inconsistently
      tracked.  It is possible to completely drain the queue
      without sending anything when using PR-SCTP.  In this
      case, the empty state will not be correctly state as
      report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
      can cause an association to be perminantly stuck in the
      SHUTDOWN_PENDING state.
      
      Additionally, SCTP is incredibly inefficient when setting
      the empty state.  Even though all the data is availaible
      in the outqueue structure, we ignore it and walk a list
      of trasnports.
      
      In the end, we can completely remove the extra empty
      state and figure out if the queue is empty by looking
      at 3 things:  length of pending data, length of in-flight
      data, and exisiting of retransmit data.  All of these
      are already in the strucutre.
      Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619a60ee
  24. 27 12月, 2013 1 次提交
  25. 07 12月, 2013 1 次提交
  26. 29 11月, 2013 1 次提交
  27. 24 11月, 2013 1 次提交
  28. 10 8月, 2013 1 次提交
  29. 25 7月, 2013 1 次提交
  30. 10 7月, 2013 1 次提交
    • D
      net: sctp: confirm route during forward progress · 8c2f414a
      Daniel Borkmann 提交于
      This fix has been proposed originally by Vlad Yasevich. He says:
      
        When SCTP makes forward progress (receives a SACK that acks new chunks,
        renegs, or answeres 0-window probes) or when HB-ACK arrives, mark
        the route as confirmed so we don't unnecessarily send NUD probes.
      
      Having a simple SCTP client/server that exchange data chunks every 1sec,
      without this patch ARP requests are sent periodically every 40-60sec.
      With this fix applied, an ARP request is only done once right at the
      "session" beginning. Also, when clearing the related ARP cache entry
      manually during the session, a new request is correctly done. I have
      only "backported" this to net-next and tested that it works, so full
      credit goes to Vlad.
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c2f414a
  31. 02 7月, 2013 2 次提交
    • D
      net: sctp: get rid of SCTP_DBG_TSNS entirely · e02010ad
      Daniel Borkmann 提交于
      After having reworked the debugging framework, Neil and Vlad agreed to
      get rid of the leftover SCTP_DBG_TSNS code for a couple of reasons:
      
      We can use systemtap scripts to investigate these things, we now have
      pr_debug() helpers that make life easier, and if we really need anything
      else besides those tools, we will be forced to come up with something
      better than we have there. Therefore, get rid of this ifdef debugging
      code entirely for now.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Neil Horman <nhorman@tuxdriver.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e02010ad
    • D
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann 提交于
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      
      To turn on all SCTP debugging, the following steps are needed:
      
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
      
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb33381d
  32. 14 6月, 2013 1 次提交
    • N
      sctp: fully initialize sctp_outq in sctp_outq_init · c5c7774d
      Neil Horman 提交于
      In commit 2f94aabd
      (refactor sctp_outq_teardown to insure proper re-initalization)
      we modified sctp_outq_teardown to use sctp_outq_init to fully re-initalize the
      outq structure.  Steve West recently asked me why I removed the q->error = 0
      initalization from sctp_outq_teardown.  I did so because I was operating under
      the impression that sctp_outq_init would properly initalize that value for us,
      but it doesn't.  sctp_outq_init operates under the assumption that the outq
      struct is all 0's (as it is when called from sctp_association_init), but using
      it in __sctp_outq_teardown violates that assumption. We should do a memset in
      sctp_outq_init to ensure that the entire structure is in a known state there
      instead.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: N"West, Steve (NSN - US/Fort Worth)" <steve.west@nsn.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      CC: davem@davemloft.net
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5c7774d