1. 14 11月, 2009 1 次提交
    • V
      sctp: Set source addresses on the association before adding transports · 409b95af
      Vlad Yasevich 提交于
      Recent commit 8da645e1
      	sctp: Get rid of an extra routing lookup when adding a transport
      introduced a regression in the connection setup.  The behavior was
      
      different between IPv4 and IPv6.  IPv4 case ended up working because the
      route lookup routing returned a NULL route, which triggered another
      route lookup later in the output patch that succeeded.  In the IPv6 case,
      a valid route was returned for first call, but we could not find a valid
      source address at the time since the source addresses were not set on the
      association yet.  Thus resulted in a hung connection.
      
      The solution is to set the source addresses on the association prior to
      adding peers.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      409b95af
  2. 01 10月, 2009 1 次提交
  3. 05 9月, 2009 7 次提交
    • W
      sctp: turn flags in 'struct sctp_association' into bit fields · 9237ccbc
      Wei Yongjun 提交于
      This shrinks the size of struct sctp_association a little.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9237ccbc
    • B
      sctp: Sysctl configuration for IPv4 Address Scoping · 72388433
      Bhaskar Dutta 提交于
      This patch introduces a new sysctl option to make IPv4 Address Scoping
      configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
      
      In networking environments where DNAT rules in iptables prerouting
      chains convert destination IP's to link-local/private IP addresses,
      SCTP connections fail to establish as the INIT chunk is dropped by the
      kernel due to address scope match failure.
      For example to support overlapping IP addresses (same IP address with
      different vlan id) a Layer-5 application listens on link local IP's,
      and there is a DNAT rule that maps the destination IP to a link local
      IP. Such applications never get the SCTP INIT if the address-scoping
      draft is strictly followed.
      
      This sysctl configuration allows SCTP to function in such
      unconventional networking environments.
      
      Sysctl options:
      0 - Disable IPv4 address scoping draft altogether
      1 - Enable IPv4 address scoping (default, current behavior)
      2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
      3 - Enable address scoping but allow IPv4 link local address in init/init-ack
      Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      72388433
    • V
      sctp: Turn flags in 'sctp_packet' into bit fields · a803c942
      Vlad Yasevich 提交于
      This shrinks the size of sctp_packet a little.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      a803c942
    • V
      sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05
      Vlad Yasevich 提交于
      We had a bug that we never stored the user-defined value for
      MAXSEG when setting the value on an association.  Thus future
      PMTU events ended up re-writing the frag point and increasing
      it past user limit.  Additionally, when setting the option on
      the socket/endpoint, we effect all current associations, which
      is against spec.
      
      Now, we store the user 'maxseg' value along with the computed
      'frag_point'.  We inherit 'maxseg' from the socket at association
      creation and use it as an upper limit for 'frag_point' when its
      set.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      f68b2e05
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6
      Vlad Yasevich 提交于
      SCTP has a problem that when small chunks are used, it is possible
      to exhaust the receiver buffer without fully closing receive window.
      This happens due to all overhead that we have account for with small
      messages.  To fix this, when receive buffer is exceeded, we'll drop
      the window to 0 and save the 'drop' portion.  When application starts
      reading data and freeing up recevie buffer space, we'll wait until
      we've reached the 'drop' window and then add back this 'drop' one
      mtu at a time.  This worked well in testing and under stress produced
      rather even recovery.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4d3c46e6
    • V
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich 提交于
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
  4. 03 6月, 2009 1 次提交
    • W
      sctp: fix to choose alternate destination when retransmit ASCONF chunk · 9919b455
      Wei Yongjun 提交于
      RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
      
      B4)  Re-transmit the ASCONF Chunk last sent and if possible choose an
           alternate destination address (please refer to [RFC4960],
           Section 6.4.1).  An endpoint MUST NOT add new parameters to this
           chunk; it MUST be the same (including its Sequence Number) as
           the last ASCONF sent.  An endpoint MAY, however, bundle an
           additional ASCONF with new ASCONF parameters with the next
           Sequence Number.  For details, see Section 5.5.
      
      This patch fix to choose an alternate destination address when
      re-transmit the ASCONF chunk, with some dup codes cleanup.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9919b455
  5. 16 2月, 2009 2 次提交
    • V
      sctp: Fix the RTO-doubling on idle-link heartbeats · faee47cd
      Vlad Yasevich 提交于
      SCTP incorrectly doubles rto ever time a Hearbeat chunk
      is generated.   However RFC 4960 states:
      
         On an idle destination address that is allowed to heartbeat, it is
         recommended that a HEARTBEAT chunk is sent once per RTO of that
         destination address plus the protocol parameter 'HB.interval', with
         jittering of +/- 50% of the RTO value, and exponential backoff of the
         RTO if the previous HEARTBEAT is unanswered.
      
      Essentially, of if the heartbean is unacknowledged, do we double the RTO.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faee47cd
    • L
      sctp: Allow to disable SCTP checksums via module parameter · 06e86806
      Lucas Nussbaum 提交于
      This is a new version of my patch, now using a module parameter instead
      of a sysctl, so that the option is harder to find. Please note that,
      once the module is loaded, it is still possible to change the value of
      the parameter in /sys/module/sctp/parameters/, which is useful if you
      want to do performance comparisons without rebooting.
      
      Computation of SCTP checksums significantly affects the performance of
      SCTP. For example, using two dual-Opteron 246 connected using a Gbe
      network, it was not possible to achieve more than ~730 Mbps, compared to
      941 Mbps after disabling SCTP checksums.
      Unfortunately, SCTP checksum offloading in NICs is not commonly
      available (yet).
      
      By default, checksums are still enabled, of course.
      Signed-off-by: NLucas Nussbaum <lucas.nussbaum@ens-lyon.fr>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e86806
  6. 09 10月, 2008 1 次提交
    • V
      sctp: Rework the tsn map to use generic bitmap. · 8e1ee18c
      Vlad Yasevich 提交于
      The tsn map currently use is 4K large and is stuck inside
      the sctp_association structure making memory references REALLY
      expensive.  What we really need is at most 4K worth of bits
      so the biggest map we would have is 512 bytes.   Also, the
      map is only really usefull when we have gaps to store and
      report.  As such, starting with minimal map of say 32 TSNs (bits)
      should be enough for normal low-loss operations.  We can grow
      the map by some multiple of 32 along with some extra room any
      time we receive the TSN which would put us outside of the map
      boundry.  As we close gaps, we can shift the map to rebase
      it on the latest TSN we've seen.  This saves 4088 bytes per
      association just in the map alone along savings from the now
      unnecessary structure members.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e1ee18c
  7. 01 10月, 2008 2 次提交
  8. 04 8月, 2008 1 次提交
    • H
      sctp: Drop ipfargok in sctp_xmit function · f880374c
      Herbert Xu 提交于
      The ipfragok flag controls whether the packet may be fragmented
      either on the local host on beyond.  The latter is only valid on
      IPv4.
      
      In fact, we never want to do the latter even on IPv4 when PMTU is
      enabled.  This is because even though we can't fragment packets
      within SCTP due to the prtocol's inherent faults, we can still
      fragment it at IP layer.  By setting the DF bit we will improve
      the PMTU process.
      
      RFC 2960 only says that we SHOULD clear the DF bit in this case,
      so we're compliant even if we set the DF bit.  In fact RFC 4960
      no longer has this statement.
      
      Once we make this change, we only need to control the local
      fragmentation.  There is already a bit in the skb which controls
      that, local_df.  So this patch sets that instead of using the
      ipfragok argument.
      
      The only complication is that there isn't a struct sock object
      per transport, so for IPv4 we have to resort to changing the
      pmtudisc field for every packet.  This should be safe though
      as the protocol is single-threaded.
      
      Note that after this patch we can remove ipfragok from the rest
      of the stack too.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f880374c
  9. 23 7月, 2008 1 次提交
  10. 19 7月, 2008 1 次提交
  11. 20 6月, 2008 1 次提交
    • V
      sctp: Follow security requirement of responding with 1 packet · 2e3216cd
      Vlad Yasevich 提交于
      RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
      
      When an SCTP stack receives a packet containing multiple control or
      DATA chunks and the processing of the packet requires the sending of
      multiple chunks in response, the sender of the response chunk(s) MUST
      NOT send more than one packet.  If bundling is supported, multiple
      response chunks that fit into a single packet MAY be bundled together
      into one single response packet.  If bundling is not supported, then
      the sender MUST NOT send more than one response chunk and MUST
      discard all other responses.  Note that this rule does NOT apply to a
      SACK chunk, since a SACK chunk is, in itself, a response to DATA and
      a SACK does not require a response of more DATA.
      
      We implement this by not servicing our outqueue until we reach the end
      of the packet.  This enables maximum bundling.  We also identify
      'response' chunks and make sure that we only send 1 packet when sending
      such chunks.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e3216cd
  12. 05 6月, 2008 4 次提交
  13. 10 5月, 2008 1 次提交
  14. 24 3月, 2008 1 次提交
  15. 01 3月, 2008 1 次提交
  16. 05 2月, 2008 1 次提交
  17. 29 1月, 2008 5 次提交
  18. 21 12月, 2007 1 次提交
  19. 07 12月, 2007 1 次提交
  20. 10 11月, 2007 1 次提交
  21. 08 11月, 2007 4 次提交
  22. 11 10月, 2007 1 次提交