1. 23 6月, 2021 6 次提交
    • X
      sctp: do state transition when a probe succeeds on HB ACK recv path · b87641af
      Xin Long 提交于
      As described in rfc8899#section-5.2, when a probe succeeds, there might
      be the following state transitions:
      
        - Base -> Search, occurs when probe succeeds with BASE_PLPMTU,
          pl.pmtu is not changing,
          pl.probe_size increases by SCTP_PL_BIG_STEP,
      
        - Error -> Search, occurs when probe succeeds with BASE_PLPMTU,
          pl.pmtu is changed from SCTP_MIN_PLPMTU to SCTP_BASE_PLPMTU,
          pl.probe_size increases by SCTP_PL_BIG_STEP.
      
        - Search -> Search Complete, occurs when probe succeeds with the probe
          size SCTP_MAX_PLPMTU less than pl.probe_high,
          pl.pmtu is not changing, but update *pathmtu* with it,
          pl.probe_size is set back to pl.pmtu to double check it.
      
        - Search Complete -> Search, occurs when probe succeeds with the probe
          size equal to pl.pmtu,
          pl.pmtu is not changing,
          pl.probe_size increases by SCTP_PL_MIN_STEP.
      
      So search process can be described as:
      
       1. When it just enters 'Search' state, *pathmtu* is not updated with
          pl.pmtu, and probe_size increases by a big step (SCTP_PL_BIG_STEP)
          each round.
      
       2. Until pl.probe_high is set when a probe fails, and probe_size
          decreases back to pl.pmtu, as described in the last patch.
      
       3. When the probe with the new size succeeds, probe_size changes to
          increase by a small step (SCTP_PL_MIN_STEP) due to pl.probe_high
          is set.
      
       4. Until probe_size is next to pl.probe_high, the searching finishes and
          it goes to 'Complete' state and updates *pathmtu* with pl.pmtu, and
          then probe_size is set to pl.pmtu to confirm by once more probe.
      
       5. This probe occurs after "30 * probe_inteval", a much longer time than
          that in Search state. Once it is done it goes to 'Search' state again
          with probe_size increased by SCTP_PL_MIN_STEP.
      
      As we can see above, during the searching, pl.pmtu changes while *pathmtu*
      doesn't. *pathmtu* is only updated when the search finishes by which it
      gets an optimal value for it. A big step is used at the beginning until
      it gets close to the optimal value, then it changes to a small step until
      it has this optimal value.
      
      The small step is also used in 'Complete' until it goes to 'Search' state
      again and the probe with 'pmtu + the small step' succeeds, which means a
      higher size could be used. Then probe_size changes to increase by a big
      step again until it gets close to the next optimal value.
      
      Note that anytime when black hole is detected, it goes directly to 'Base'
      state with pl.pmtu set to SCTP_BASE_PLPMTU, as described in the last patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87641af
    • X
      sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path · 1dc68c19
      Xin Long 提交于
      The state transition is described in rfc8899#section-5.2,
      PROBE_COUNT == MAX_PROBES means the probe fails for MAX times, and the
      state transition includes:
      
        - Base -> Error, occurs when BASE_PLPMTU Confirmation Fails,
          pl.pmtu is set to SCTP_MIN_PLPMTU,
          probe_size is still SCTP_BASE_PLPMTU;
      
        - Search -> Base, occurs when Black Hole Detected,
          pl.pmtu is set to SCTP_BASE_PLPMTU,
          probe_size is set back to SCTP_BASE_PLPMTU;
      
        - Search Complete -> Base, occurs when Black Hole Detected
          pl.pmtu is set to SCTP_BASE_PLPMTU,
          probe_size is set back to SCTP_BASE_PLPMTU;
      
      Note a black hole is encountered when a sender is unaware that packets
      are not being delivered to the destination endpoint. So it includes the
      probe failures with equal probe_size to pl.pmtu, and definitely not
      include that with greater probe_size than pl.pmtu. The later one is the
      normal probe failure where probe_size should decrease back to pl.pmtu
      and pl.probe_high is set.  pl.probe_high would be used on HB ACK recv
      path in the next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1dc68c19
    • X
      sctp: do the basic send and recv for PLPMTUD probe · fe59379b
      Xin Long 提交于
      This patch does exactly what rfc8899#section-6.2.1.2 says:
      
         The SCTP sender needs to be able to determine the total size of a
         probe packet.  The HEARTBEAT chunk could carry a Heartbeat
         Information parameter that includes, besides the information
         suggested in [RFC4960], the probe size to help an implementation
         associate a HEARTBEAT ACK with the size of probe that was sent.  The
         sender could also use other methods, such as sending a nonce and
         verifying the information returned also contains the corresponding
         nonce.  The length of the PAD chunk is computed by reducing the
         probing size by the size of the SCTP common header and the HEARTBEAT
         chunk.
      
      Note that HB ACK chunk will carry back whatever HB chunk carried, including
      the probe_size we put it in; We also check hbinfo->probe_size in the HB ACK
      against link->pl.probe_size to validate this HB ACK chunk.
      
      v1->v2:
        - Remove the unused 'sp' and add static for sctp_packet_bundle_pad().
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe59379b
    • X
      sctp: add the probe timer in transport for PLPMTUD · 92548ec2
      Xin Long 提交于
      There are 3 timers described in rfc8899#section-5.1.1:
      
        PROBE_TIMER, PMTU_RAISE_TIMER, CONFIRMATION_TIMER
      
      This patches adds a 'probe_timer' in transport, and it works as either
      PROBE_TIMER or PMTU_RAISE_TIMER. At most time, it works as PROBE_TIMER
      and expires every a 'probe_interval' time to send the HB probe packet.
      When transport pl enters COMPLETE state, it works as PMTU_RAISE_TIMER
      and expires in 'probe_interval * 30' time to go back to SEARCH state
      and do searching again.
      
      SCTP HB is an acknowledged packet, CONFIRMATION_TIMER is not needed.
      
      The timer will start when transport pl enters BASE state and stop
      when it enters DISABLED state.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      92548ec2
    • X
      sctp: add the constants/variables and states and some APIs for transport · d9e2e410
      Xin Long 提交于
      These are 4 constants described in rfc8899#section-5.1.2:
      
        MAX_PROBES, MIN_PLPMTU, MAX_PLPMTU, BASE_PLPMTU;
      
      And 2 variables described in rfc8899#section-5.1.3:
      
        PROBED_SIZE, PROBE_COUNT;
      
      And 5 states described in rfc8899#section-5.2:
      
        DISABLED, BASE, SEARCH, SEARCH_COMPLETE, ERROR;
      
      And these 4 APIs are used to reset/update PLPMTUD, check if PLPMTUD is
      enabled, and calculate the additional headers length for a transport.
      
      Note the member 'probe_high' in transport will be set to the probe
      size when a probe fails with this probe size in the next patches.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9e2e410
    • X
      sctp: add probe_interval in sysctl and sock/asoc/transport · d1e462a7
      Xin Long 提交于
      PLPMTUD can be enabled by doing 'sysctl -w net.sctp.probe_interval=n'.
      'n' is the interval for PLPMTUD probe timer in milliseconds, and it
      can't be less than 5000 if it's not 0.
      
      All asoc/transport's PLPMTUD in a new socket will be enabled by default.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1e462a7
  2. 06 11月, 2020 1 次提交
  3. 31 10月, 2020 3 次提交
    • X
      sctp: add udphdr to overhead when udp_port is set · f1bfe8b5
      Xin Long 提交于
      sctp_mtu_payload() is for calculating the frag size before making
      chunks from a msg. So we should only add udphdr size to overhead
      when udp socks are listening, as only then sctp can handle the
      incoming sctp over udp packets and outgoing sctp over udp packets
      will be possible.
      
      Note that we can't do this according to transport->encap_port, as
      different transports may be set to different values, while the
      chunks were made before choosing the transport, we could not be
      able to meet all rfc6951#section-5.6 recommends.
      
      v1->v2:
        - Add udp_port for sctp_sock to avoid a potential race issue, it
          will be used in xmit path in the next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      f1bfe8b5
    • X
      sctp: allow changing transport encap_port by peer packets · a1dd2cf2
      Xin Long 提交于
      As rfc6951#section-5.4 says:
      
        "After finding the SCTP association (which
         includes checking the verification tag), the UDP source port MUST be
         stored as the encapsulation port for the destination address the SCTP
         packet is received from (see Section 5.1).
      
         When a non-encapsulated SCTP packet is received by the SCTP stack,
         the encapsulation of outgoing packets belonging to the same
         association and the corresponding destination address MUST be
         disabled."
      
      transport encap_port should be updated by a validated incoming packet's
      udp src port.
      
      We save the udp src port in sctp_input_cb->encap_port, and then update
      the transport in two places:
      
        1. right after vtag is verified, which is required by RFC, and this
           allows the existent transports to be updated by the chunks that
           can only be processed on an asoc.
      
        2. right before processing the 'init' where the transports are added,
           and this allows building a sctp over udp connection by client with
           the server not knowing the remote encap port.
      
        3. when processing ootb_pkt and creating the temporary transport for
           the reply pkt.
      
      Note that sctp_input_cb->header is removed, as it's not used any more
      in sctp.
      
      v1->v2:
        - Change encap_port as __be16 for sctp_input_cb.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      a1dd2cf2
    • X
      sctp: add encap_port for netns sock asoc and transport · e8a3001c
      Xin Long 提交于
      encap_port is added as per netns/sock/assoc/transport, and the
      latter one's encap_port inherits the former one's by default.
      The transport's encap_port value would mostly decide if one
      packet should go out with udp encapsulated or not.
      
      This patch also allows users to set netns' encap_port by sysctl.
      
      v1->v2:
        - Change to define encap_port as __be16 for sctp_sock, asoc and
          transport.
      v2->v3:
        - No change.
      v3->v4:
        - Add 'encap_port' entry in ip-sysctl.rst.
      v4->v5:
        - Improve the description of encap_port in ip-sysctl.rst.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      e8a3001c
  4. 21 9月, 2020 1 次提交
  5. 25 7月, 2020 1 次提交
  6. 20 7月, 2020 1 次提交
  7. 16 7月, 2020 1 次提交
  8. 01 3月, 2020 1 次提交
  9. 24 11月, 2019 1 次提交
    • X
      sctp: cache netns in sctp_ep_common · 31243461
      Xin Long 提交于
      This patch is to fix a data-race reported by syzbot:
      
        BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj
      
        write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
          sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
          sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
          sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
          inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
          __sys_accept4+0x224/0x430 net/socket.c:1754
          __do_sys_accept net/socket.c:1795 [inline]
          __se_sys_accept net/socket.c:1792 [inline]
          __x64_sys_accept+0x4e/0x60 net/socket.c:1792
          do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
          sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
          rht_key_get_hash include/linux/rhashtable.h:133 [inline]
          rht_key_hashfn include/linux/rhashtable.h:159 [inline]
          rht_head_hashfn include/linux/rhashtable.h:174 [inline]
          head_hashfn lib/rhashtable.c:41 [inline]
          rhashtable_rehash_one lib/rhashtable.c:245 [inline]
          rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
          rhashtable_rehash_table lib/rhashtable.c:316 [inline]
          rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
          process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
          worker_thread+0xa0/0x800 kernel/workqueue.c:2415
          kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
      is changing its value. However, what rhashtable wants is netns from asoc
      base.sk, and for an asoc, its netns won't change once set. So we can
      simply fix it by caching netns since created.
      
      Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
      Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      31243461
  10. 09 11月, 2019 2 次提交
    • X
      sctp: add support for Primary Path Switchover · 34515e94
      Xin Long 提交于
      This is a new feature defined in section 5 of rfc7829: "Primary Path
      Switchover". By introducing a new tunable parameter:
      
        Primary.Switchover.Max.Retrans (PSMR)
      
      The primary path will be changed to another active path when the path
      error counter on the old primary path exceeds PSMR, so that "the SCTP
      sender is allowed to continue data transmission on a new working path
      even when the old primary destination address becomes active again".
      
      This patch is to add this tunable parameter, 'ps_retrans' per netns,
      sock, asoc and transport. It also allows a user to change ps_retrans
      per netns by sysctl, and ps_retrans per sock/asoc/transport will be
      initialized with it.
      
      The check will be done in sctp_do_8_2_transport_strike() when this
      feature is enabled.
      
      Note this feature is disabled by initializing 'ps_retrans' per netns
      as 0xffff by default, and its value can't be less than 'pf_retrans'
      when changing by sysctl.
      
      v3->v4:
        - add define SCTP_PS_RETRANS_MAX 0xffff, and use it on extra2 of
          sysctl 'ps_retrans'.
        - add a new entry for ps_retrans on ip-sysctl.txt.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34515e94
    • X
      sctp: add pf_expose per netns and sock and asoc · aef587be
      Xin Long 提交于
      As said in rfc7829, section 3, point 12:
      
        The SCTP stack SHOULD expose the PF state of its destination
        addresses to the ULP as well as provide the means to notify the
        ULP of state transitions of its destination addresses from
        active to PF, and vice versa.  However, it is recommended that
        an SCTP stack implementing SCTP-PF also allows for the ULP to be
        kept ignorant of the PF state of its destinations and the
        associated state transitions, thus allowing for retention of the
        simpler state transition model of [RFC4960] in the ULP.
      
      Not only does it allow to expose the PF state to ULP, but also
      allow to ignore sctp-pf to ULP.
      
      So this patch is to add pf_expose per netns, sock and asoc. And in
      sctp_assoc_control_transport(), ulp_notify will be set to false if
      asoc->expose is not 'enabled' in next patch.
      
      It also allows a user to change pf_expose per netns by sysctl, and
      pf_expose per sock and asoc will be initialized with it.
      
      Note that pf_expose also works for SCTP_GET_PEER_ADDR_INFO sockopt,
      to not allow a user to query the state of a sctp-pf peer address
      when pf_expose is 'disabled', as said in section 7.3.
      
      v1->v2:
        - Fix a build warning noticed by Nathan Chancellor.
      v2->v3:
        - set pf_expose to UNUSED by default to keep compatible with old
          applications.
      v3->v4:
        - add a new entry for pf_expose on ip-sysctl.txt, as Marcelo suggested.
        - change this patch to 1/5, and move sctp_assoc_control_transport
          change into 2/5, as Marcelo suggested.
        - use SCTP_PF_EXPOSE_UNSET instead of SCTP_PF_EXPOSE_UNUSED, and
          set SCTP_PF_EXPOSE_UNSET to 0 in enum, as Marcelo suggested.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aef587be
  11. 28 8月, 2019 1 次提交
  12. 20 8月, 2019 1 次提交
  13. 09 7月, 2019 4 次提交
  14. 24 5月, 2019 1 次提交
  15. 13 3月, 2019 1 次提交
  16. 30 1月, 2019 2 次提交
  17. 04 12月, 2018 1 次提交
  18. 20 11月, 2018 2 次提交
  19. 13 11月, 2018 2 次提交
  20. 16 10月, 2018 1 次提交
    • X
      sctp: use the pmtu from the icmp packet to update transport pathmtu · d805397c
      Xin Long 提交于
      Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
      is also processing transport pmtu_pending by icmp packets. But it's
      meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.
      
      The right pmtu value should come from the icmp packet, and it would
      be saved into transport->mtu_info in this patch and used later when
      the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.
      
      Besides, without this patch, as pmtu can only be updated correctly
      when receiving a icmp packet and no place is holding sock lock, it
      will take long time if the sock is busy with sending packets.
      
      Note that it doesn't process transport->mtu_info in .release_cb(),
      as there is no enough information for pmtu update, like for which
      asoc or transport. It is not worth traversing all asocs to check
      pmtu_pending. So unlike tcp, sctp does this in tx path, for which
      mtu_info needs to be atomic_t.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d805397c
  21. 12 8月, 2018 2 次提交
  22. 04 7月, 2018 1 次提交
    • X
      sctp: add support for dscp and flowlabel per transport · 8a9c58d2
      Xin Long 提交于
      Like some other per transport params, flowlabel and dscp are added
      in transport, asoc and sctp_sock. By default, transport sets its
      value from asoc's, and asoc does it from sctp_sock. flowlabel
      only works for ipv6 transport.
      
      Other than that they need to be passed down in sctp_xmit, flow4/6
      also needs to set them before looking up route in get_dst.
      
      Note that it uses '& 0x100000' to check if flowlabel is set and
      '& 0x1' (tos 1st bit is unused) to check if dscp is set by users,
      so that they could be set to 0 by sockopt in next patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a9c58d2
  23. 29 6月, 2018 1 次提交
    • X
      sctp: add support for SCTP_REUSE_PORT sockopt · b0e9a2fe
      Xin Long 提交于
      This feature is actually already supported by sk->sk_reuse which can be
      set by socket level opt SO_REUSEADDR. But it's not working exactly as
      RFC6458 demands in section 8.1.27, like:
      
        - This option only supports one-to-one style SCTP sockets
        - This socket option must not be used after calling bind()
          or sctp_bindx().
      
      Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
      Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
      work in linux.
      
      To separate it from the socket level version, this patch adds 'reuse' in
      sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
      setup limitations that are needed when it is being enabled.
      
      "It should be noted that the behavior of the socket-level socket option
      to reuse ports and/or addresses for SCTP sockets is unspecified", so it
      leaves SO_REUSEADDR as is for the compatibility.
      
      Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
      functionality is nearly identical to SO_REUSEADDR, but with some
      extra restrictions. Here it uses 'reuse' in sctp_sock instead of
      'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
      added in another patch.
      
      Thanks to Neil to make this clear.
      
      v1->v2:
        - add sctp_sk->reuse to separate it from the socket level version.
      v2->v3:
        - improve changelog according to Marcelo's suggestion.
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0e9a2fe
  24. 22 6月, 2018 1 次提交
    • N
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown 提交于
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  25. 15 6月, 2018 1 次提交