1. 23 2月, 2019 1 次提交
  2. 21 9月, 2018 1 次提交
    • X
      sctp: update dst pmtu with the correct daddr · d7ab5cdc
      Xin Long 提交于
      When processing pmtu update from an icmp packet, it calls .update_pmtu
      with sk instead of skb in sctp_transport_update_pmtu.
      
      However for sctp, the daddr in the transport might be different from
      inet_sock->inet_daddr or sk->sk_v6_daddr, which is used to update or
      create the route cache. The incorrect daddr will cause a different
      route cache created for the path.
      
      So before calling .update_pmtu, inet_sock->inet_daddr/sk->sk_v6_daddr
      should be updated with the daddr in the transport, and update it back
      after it's done.
      
      The issue has existed since route exceptions introduction.
      
      Fixes: 4895c771 ("ipv4: Add FIB nexthop exceptions.")
      Reported-by: ian.periam@dialogic.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7ab5cdc
  3. 04 7月, 2018 1 次提交
  4. 05 6月, 2018 1 次提交
  5. 28 4月, 2018 3 次提交
  6. 09 1月, 2018 1 次提交
    • M
      sctp: fix the handling of ICMP Frag Needed for too small MTUs · b6c5734d
      Marcelo Ricardo Leitner 提交于
      syzbot reported a hang involving SCTP, on which it kept flooding dmesg
      with the message:
      [  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
      low, using default minimum of 512
      
      That happened because whenever SCTP hits an ICMP Frag Needed, it tries
      to adjust to the new MTU and triggers an immediate retransmission. But
      it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
      allowed (512) would not cause the PMTU to change, and issued the
      retransmission anyway (thus leading to another ICMP Frag Needed, and so
      on).
      
      As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
      are higher than that, sctp_transport_update_pmtu() is changed to
      re-fetch the PMTU that got set after our request, and with that, detect
      if there was an actual change or not.
      
      The fix, thus, skips the immediate retransmission if the received ICMP
      resulted in no change, in the hope that SCTP will select another path.
      
      Note: The value being used for the minimum MTU (512,
      SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
      SCTP_MIN_PMTU), but such change belongs to another patch.
      
      Changes from v1:
      - do not disable PMTU discovery, in the light of commit
      06ad3919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
      and as suggested by Xin Long.
      - changed the way to break the rtx loop by detecting if the icmp
        resulted in a change or not
      Changes from v2:
      none
      
      See-also: https://lkml.org/lkml/2017/12/22/811Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6c5734d
  7. 25 10月, 2017 1 次提交
  8. 07 8月, 2017 1 次提交
  9. 05 7月, 2017 1 次提交
  10. 26 6月, 2017 4 次提交
  11. 05 4月, 2017 1 次提交
  12. 28 2月, 2017 1 次提交
  13. 08 2月, 2017 1 次提交
  14. 19 1月, 2017 1 次提交
    • X
      sctp: add stream reconf timer · 7b9438de
      Xin Long 提交于
      This patch is to add a per transport timer based on sctp timer frame
      for stream reconf chunk retransmission. It would start after sending
      a reconf request chunk, and stop after receiving the response chunk.
      
      If the timer expires, besides retransmitting the reconf request chunk,
      it would also do the same thing with data RTO timer. like to increase
      the appropriate error counts, and perform threshold management, possibly
      destroying the asoc if sctp retransmission thresholds are exceeded, just
      as section 5.1.1 describes.
      
      This patch is also to add asoc strreset_chunk, it is used to save the
      reconf request chunk, so that it can be retransmitted, and to check if
      the response is really for this request by comparing the information
      inside with the response chunk as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b9438de
  15. 26 12月, 2016 1 次提交
    • T
      ktime: Cleanup ktime_set() usage · 8b0e1953
      Thomas Gleixner 提交于
      ktime_set(S,N) was required for the timespec storage type and is still
      useful for situations where a Seconds and Nanoseconds part of a time value
      needs to be converted. For anything where the Seconds argument is 0, this
      is pointless and can be replaced with a simple assignment.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8b0e1953
  16. 22 9月, 2016 1 次提交
  17. 11 4月, 2016 1 次提交
    • M
      sctp: avoid refreshing heartbeat timer too often · ba6f5e33
      Marcelo Ricardo Leitner 提交于
      Currently on high rate SCTP streams the heartbeat timer refresh can
      consume quite a lot of resources as timer updates are costly and it
      contains a random factor, which a) is also costly and b) invalidates
      mod_timer() optimization for not editing a timer to the same value.
      It may even cause the timer to be slightly advanced, for no good reason.
      
      As suggested by David Laight this patch now removes this timer update
      from hot path by leaving the timer on and re-evaluating upon its
      expiration if the heartbeat is still needed or not, similarly to what is
      done for TCP. If it's not needed anymore the timer is re-scheduled to
      the new timeout, considering the time already elapsed.
      
      For this, we now record the last tx timestamp per transport, updated in
      the same spots as hb timer was restarted on tx. Also split up
      sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
      sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
      heartbeat one.
      
      On loopback with MTU of 65535 and data chunks with 1636, so that we
      have a considerable amount of chunks without stressing system calls,
      netperf -t SCTP_STREAM -l 30, perf looked like this before:
      
      Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
        Overhead  Command  Shared Object      Symbol
      +    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
         - _raw_write_unlock_irqrestore
            - 96,54% _raw_spin_unlock_irqrestore
               - 36,14% mod_timer
                  + 97,24% sctp_transport_reset_timers
                  + 2,76% sctp_do_sm
               + 33,65% __wake_up_sync_key
               + 28,77% sctp_ulpq_tail_event
               + 1,40% del_timer
            - 1,84% mod_timer
               + 99,03% sctp_transport_reset_timers
               + 0,97% sctp_do_sm
            + 1,50% sctp_ulpq_tail_event
      
      And after this patch, now with netperf -l 60:
      
      Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
        Overhead  Command  Shared Object      Symbol
      +    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
      +    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            + 49,89% __wake_up_sync_key
            + 45,68% sctp_ulpq_tail_event
            - 2,85% mod_timer
               + 76,51% sctp_transport_reset_t3_rtx
               + 23,49% sctp_do_sm
            + 1,55% del_timer
      +    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
      +    2,26%  netperf  [sctp]             [k] sctp_sendmsg
      
      Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
      ~3.7%.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba6f5e33
  18. 21 3月, 2016 1 次提交
    • M
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner 提交于
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3822a5ff
  19. 14 3月, 2016 1 次提交
    • X
      sctp: fix the transports round robin issue when init is retransmitted · 39d2adeb
      Xin Long 提交于
      prior to this patch, at the beginning if we have two paths in one assoc,
      they may have the same params other than the last_time_heard, it will try
      the paths like this:
      
      1st cycle
        try trans1 fail.
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      2nd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      3rd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      ....
      
      trans1 will never have change to be selected, which is not what we expect.
      we should keeping round robin all the paths if they are just added at the
      beginning.
      
      So at first every tranport's last_time_heard should be initialized 0, so
      that we ensure they have the same value at the beginning, only by this,
      all the transports could get equal chance to be selected.
      
      Then for sctp_trans_elect_best, it should return the trans_next one when
      *trans == *trans_next, so that we can try next if it fails,  but now it
      always return trans. so we can fix it by exchanging these two params when
      we calls sctp_trans_elect_tie().
      
      Fixes: 4c47af4d ('net: sctp: rework multihoming retransmission path selection to rfc4960')
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39d2adeb
  20. 29 1月, 2016 2 次提交
  21. 10 11月, 2015 1 次提交
    • A
      remove abs64() · 79211c8e
      Andrew Morton 提交于
      Switch everything to the new and more capable implementation of abs().
      Mainly to give the new abs() a bit of a workout.
      
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79211c8e
  22. 01 8月, 2014 1 次提交
    • J
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe 提交于
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299ee123
  23. 03 7月, 2014 1 次提交
    • D
      net: sctp: improve timer slack calculation for transport HBs · 8f61059a
      Daniel Borkmann 提交于
      RFC4960, section 8.3 says:
      
        On an idle destination address that is allowed to heartbeat,
        it is recommended that a HEARTBEAT chunk is sent once per RTO
        of that destination address plus the protocol parameter
        'HB.interval', with jittering of +/- 50% of the RTO value,
        and exponential backoff of the RTO if the previous HEARTBEAT
        is unanswered.
      
      Currently, we calculate jitter via sctp_jitter() function first,
      and then add its result to the current RTO for the new timeout:
      
        TMO = RTO + (RAND() % RTO) - (RTO / 2)
                    `------------------------^-=> sctp_jitter()
      
      Instead, we can just simplify all this by directly calculating:
      
        TMO = (RTO / 2) + (RAND() % RTO)
      
      With the help of prandom_u32_max(), we don't need to open code
      our own global PRNG, but can instead just make use of the per
      CPU implementation of prandom with better quality numbers. Also,
      we can now spare us the conditional for divide by zero check
      since no div or mod operation needs to be used. Note that
      prandom_u32_max() won't emit the same result as a mod operation,
      but we really don't care here as we only want to have a random
      number scaled into RTO interval.
      
      Note, exponential RTO backoff is handeled elsewhere, namely in
      sctp_do_8_2_transport_strike().
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f61059a
  24. 12 6月, 2014 1 次提交
  25. 14 2月, 2014 1 次提交
  26. 07 12月, 2013 1 次提交
  27. 06 12月, 2013 1 次提交
  28. 13 8月, 2013 1 次提交
  29. 10 8月, 2013 1 次提交
  30. 25 7月, 2013 1 次提交
  31. 02 7月, 2013 1 次提交
    • D
      net: sctp: rework debugging framework to use pr_debug and friends · bb33381d
      Daniel Borkmann 提交于
      We should get rid of all own SCTP debug printk macros and use the ones
      that the kernel offers anyway instead. This makes the code more readable
      and conform to the kernel code, and offers all the features of dynamic
      debbuging that pr_debug() et al has, such as only turning on/off portions
      of debug messages at runtime through debugfs. The runtime cost of having
      CONFIG_DYNAMIC_DEBUG enabled, but none of the debug statements printing,
      is negligible [1]. If kernel debugging is completly turned off, then these
      statements will also compile into "empty" functions.
      
      While we're at it, we also need to change the Kconfig option as it /now/
      only refers to the ifdef'ed code portions in outqueue.c that enable further
      debugging/tracing of SCTP transaction fields. Also, since SCTP_ASSERT code
      was enabled with this Kconfig option and has now been removed, we
      transform those code parts into WARNs resp. where appropriate BUG_ONs so
      that those bugs can be more easily detected as probably not many people
      have SCTP debugging permanently turned on.
      
      To turn on all SCTP debugging, the following steps are needed:
      
       # mount -t debugfs none /sys/kernel/debug
       # echo -n 'module sctp +p' > /sys/kernel/debug/dynamic_debug/control
      
      This can be done more fine-grained on a per file, per line basis and others
      as described in [2].
      
       [1] https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
       [2] Documentation/dynamic-debug-howto.txt
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb33381d
  32. 18 6月, 2013 1 次提交
  33. 18 4月, 2013 1 次提交
  34. 05 2月, 2013 1 次提交