1. 16 7月, 2012 1 次提交
  2. 09 7月, 2012 1 次提交
    • N
      sctp: refactor sctp_packet_append_chunk and clenup some memory leaks · ed106277
      Neil Horman 提交于
      While doing some recent work on sctp sack bundling I noted that
      sctp_packet_append_chunk was pretty inefficient.  Specifially, it was called
      recursively while trying to bundle auth and sack chunks.  Because of that we
      call sctp_packet_bundle_sack and sctp_packet_bundle_auth a total of 4 times for
      every call to sctp_packet_append_chunk, knowing that at least 3 of those calls
      will do nothing.
      
      So lets refactor sctp_packet_bundle_auth to have an outer part that does the
      attempted bundling, and an inner part that just does the chunk appends.  This
      saves us several calls per iteration that we just don't need.
      
      Also, noticed that the auth and sack bundling fail to free the chunks they
      allocate if the append fails, so make sure we add that in
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ed106277
  3. 01 7月, 2012 1 次提交
    • N
      sctp: be more restrictive in transport selection on bundled sacks · 4244854d
      Neil Horman 提交于
      It was noticed recently that when we send data on a transport, its possible that
      we might bundle a sack that arrived on a different transport.  While this isn't
      a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
      2960:
      
       An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
         etc.) to the same destination transport address from which it
         received the DATA or control chunk to which it is replying.  This
         rule should also be followed if the endpoint is bundling DATA chunks
         together with the reply chunk.
      
      This patch seeks to correct that.  It restricts the bundling of sack operations
      to only those transports which have moved the ctsn of the association forward
      since the last sack.  By doing this we guarantee that we only bundle outbound
      saks on a transport that has received a chunk since the last sack.  This brings
      us into stricter compliance with the RFC.
      
      Vlad had initially suggested that we strictly allow only sack bundling on the
      transport that last moved the ctsn forward.  While this makes sense, I was
      concerned that doing so prevented us from bundling in the case where we had
      received chunks that moved the ctsn on multiple transports.  In those cases, the
      RFC allows us to select any of the transports having received chunks to bundle
      the sack on.  so I've modified the approach to allow for that, by adding a state
      variable to each transport that tracks weather it has moved the ctsn since the
      last sack.  This I think keeps our behavior (and performance), close enough to
      our current profile that I think we can do this without a sysctl knob to
      enable/disable it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yaseivch <vyasevich@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Reported-by: NMichele Baldessari <michele@redhat.com>
      Reported-by: Nsorin serban <sserban@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4244854d
  4. 11 5月, 2012 1 次提交
  5. 16 4月, 2012 1 次提交
  6. 21 12月, 2011 1 次提交
    • T
      sctp: Do not account for sizeof(struct sk_buff) in estimated rwnd · a76c0adf
      Thomas Graf 提交于
      When checking whether a DATA chunk fits into the estimated rwnd a
      full sizeof(struct sk_buff) is added to the needed chunk size. This
      quickly exhausts the available rwnd space and leads to packets being
      sent which are much below the PMTU limit. This can lead to much worse
      performance.
      
      The reason for this behaviour was to avoid putting too much memory
      pressure on the receiver. The concept is not completely irational
      because a Linux receiver does in fact clone an skb for each DATA chunk
      delivered. However, Linux also reserves half the available socket
      buffer space for data structures therefore usage of it is already
      accounted for.
      
      When proposing to change this the last time it was noted that this
      behaviour was introduced to solve a performance issue caused by rwnd
      overusage in combination with small DATA chunks.
      
      Trying to reproduce this I found that with the sk_buff overhead removed,
      the performance would improve significantly unless socket buffer limits
      are increased.
      
      The following numbers have been gathered using a patched iperf
      supporting SCTP over a live 1 Gbit ethernet network. The -l option
      was used to limit DATA chunk sizes. The numbers listed are based on
      the average of 3 test runs each. Default values have been used for
      sk_(r|w)mem.
      
      Chunk
      Size    Unpatched     No Overhead
      -------------------------------------
         4    15.2 Kbit [!]   12.2 Mbit [!]
         8    35.8 Kbit [!]   26.0 Mbit [!]
        16    95.5 Kbit [!]   54.4 Mbit [!]
        32   106.7 Mbit      102.3 Mbit
        64   189.2 Mbit      188.3 Mbit
       128   331.2 Mbit      334.8 Mbit
       256   537.7 Mbit      536.0 Mbit
       512   766.9 Mbit      766.6 Mbit
      1024   810.1 Mbit      808.6 Mbit
      Signed-off-by: NThomas Graf <tgraf@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a76c0adf
  7. 15 7月, 2011 1 次提交
  8. 31 3月, 2011 1 次提交
  9. 18 9月, 2010 1 次提交
  10. 27 8月, 2010 1 次提交
  11. 01 5月, 2010 2 次提交
  12. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  13. 24 11月, 2009 3 次提交
    • N
      sctp: Fix mis-ordering of user space data when multihoming in use · d8dd1578
      Neil Horman 提交于
      Recently had a bug reported to me, in which the user was sending
      packets with a payload containing a sequence number.  The packets
      were getting delivered in order according the chunk TSN values, but
      the sequence values in the payload were arriving out of order.  At
      first I thought it must be an application error, but we eventually
      found it to be a problem on the transmit side in the sctp stack.
      
      The conditions for the error are that multihoming must be in use,
      and it helps if each transport has a different pmtu.  The problem
      occurs in sctp_outq_flush.  Basically we dequeue packets from the
      data queue, and attempt to append them to the orrered packet for a
      given transport.  After we append a data chunk we add the trasport
      to the end of a list of transports to have their packets sent at
      the end of sctp_outq_flush.  The problem occurs when a data chunks
      fills up a offered packet on a transport.  The function that does
      the appending (sctp_packet_transmit_chunk), will try to call
      sctp_packet_transmit on the full packet, and then append the chunk
      to a new packet.  This call to sctp_packet_transmit, sends that
      packet ahead of the others that may be queued in the transport_list
      in sctp_outq_flush.  The result is that frames that were sent in one
      order from the user space sending application get re-ordered prior
      to tsn assignment in sctp_packet_transmit, resulting in mis-sequencing
      of data payloads, even though tsn ordering is correct.
      
      The fix is to change where we assign a tsn.  By doing this earlier,
      we are then free to place chunks in packets, whatever way we
      see fit and the protocol will make sure to do all the appropriate
      re-ordering on receive as is needed.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NWilliam Reich <reich@ulticom.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d8dd1578
    • V
      sctp: Update max.burst implementation · 46d5a808
      Vlad Yasevich 提交于
      Current implementation of max.burst ends up limiting new
      data during cwnd decay period.  The decay is happening becuase
      the connection is idle and we are allowed to fill the congestion
      window.  The point of max.burst is to limit micro-bursts in response
      to large acks.  This still happens, as max.burst is still applied
      to each transmit opportunity.  It will also apply if a very large
      send is made (greater then allowed by burst).
      Tested-by: NFlorian Niederbacher <florian.niederbacher@student.uibk.ac.at>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      46d5a808
    • V
      sctp: Remove useless last_time_used variable · 245cba7e
      Vlad Yasevich 提交于
      The transport last_time_used variable is rather useless.
      It was only used when determining if CWND needs to be updated
      due to idle transport.  However, idle transport detection was
      based on a Heartbeat timer and last_time_used was not incremented
      when sending Heartbeats.  As a result the check for cwnd reduction
      was always true.  We can get rid of the variable and just base
      our cwnd manipulation on the HB timer (like the code comment sais).
      We also have to call into the cwnd manipulation function regardless
      of whether HBs are enabled or not.  That way we will detect idle
      transports if the user has disabled Heartbeats.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      245cba7e
  14. 05 9月, 2009 7 次提交
    • W
      sctp: remove dup code in net/sctp/output.c · be297143
      Wei Yongjun 提交于
      Use sctp_packet_reset() instead of dup code.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      be297143
    • V
      sctp: Correctly track if AUTH has been bundled. · 4007cc88
      Vlad Yasevich 提交于
      We currently track if AUTH has been bundled using the 'auth'
      pointer to the chunk.  However, AUTH is disallowed after DATA
      is already in the packet, so we need to instead use the
      'has_auth' field.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4007cc88
    • W
      sctp: fix to reset packet information after packet transmit · d521c08f
      Wei Yongjun 提交于
      The packet information does not reset after packet transmit, this
      may cause some problems such as following DATA chunk be sent without
      AUTH chunk, even if the authentication of DATA chunk has been
      requested by the peer.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d521c08f
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: Nagle delay should be based on path mtu · b29e7907
      Vlad Yasevich 提交于
      The decision to delay due to Nagle should be based on the path mtu
      and future packet size.  We currently incorrectly base it on
      'frag_point' which is the SCTP DATA segment size, and also we do
      not count DATA chunk header overhead in the computation.  This
      actuall allows situations where a user can set low 'frag_point',
      and then send small messages without delay.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b29e7907
    • V
      sctp: Generate SACKs when actually sending outbound DATA · e83963b7
      Vlad Yasevich 提交于
      We are now trying to bundle SACKs when we have outbound
      DATA to send.  However, there are situations where this
      outbound DATA will not be sent (due to congestion or 
      available window).  In such cases it's ok to wait for the
      timer to expire.  This patch refactors the sending code
      so that betfore attempting to bundle the SACK we check
      to see if the DATA will actually be transmitted.
      
      Based on eirlier works for Doug Graham <dgraham@nortel.com> and
      Wei Youngjun <yjwei@cn.fujitsu.com>.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      e83963b7
    • D
      sctp: Fix piggybacked ACKs · af87b823
      Doug Graham 提交于
      This patch corrects the conditions under which a SACK will be piggybacked
      on a DATA packet.  The previous condition was incorrect due to a
      misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
      following paragraph from section 6.2 had not been implemented correctly:
      
         Before an endpoint transmits a DATA chunk, if any received DATA
         chunks have not been acknowledged (e.g., due to delayed ack), the
         sender should create a SACK and bundle it with the outbound DATA
         chunk, as long as the size of the final SCTP packet does not exceed
         the current MTU.  See Section 6.2.
      
      When about to send a DATA chunk, the code now checks to see if the SACK
      timer is running.  If it is, we know we have a SACK to send to the
      peer, so we append the SACK (assuming available space in the packet)
      and turn off the timer.  For a simple request-response scenario, this
      will result in the SACK being bundled with the response, meaning the
      the SACK is received quickly by the client, and also meaning that no
      separate SACK packet needs to be sent by the server to acknowledge the
      request.  Prior to this patch, a separate SACK packet would have been
      sent by the server SCTP only after its delayed-ACK timer had expired
      (usually 200ms).  This is wasteful of bandwidth, and can also have a
      major negative impact on performance due the interaction of delayed ACKs
      with the Nagle algorithm.
      Signed-off-by: NDoug Graham <dgraham@nortel.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      af87b823
  15. 30 6月, 2009 1 次提交
  16. 03 6月, 2009 1 次提交
  17. 28 4月, 2009 1 次提交
  18. 22 3月, 2009 1 次提交
  19. 16 2月, 2009 2 次提交
  20. 01 2月, 2009 1 次提交
  21. 23 1月, 2009 1 次提交
    • V
      sctp: Properly timestamp outgoing data chunks for rtx purposes · 759af00e
      Vlad Yasevich 提交于
      Recent changes to the retransmit code exposed a long standing
      bug where it was possible for a chunk to be time stamped
      after the retransmit timer was reset.  This caused a rare
      situation where the retrnamist timer has expired, but
      nothing was marked for retrnasmission because all of
      timesamps on data were less then 1 rto ago.  As result,
      the timer was never restarted since nothing was retransmitted,
      and this resulted in a hung association that did couldn't
      complete the data transfer.  The solution is to timestamp
      the chunk when it's added to the packet for transmission
      purposes.  After the packet is trsnmitted the rtx timer
      is restarted.  This guarantees that when the timer expires,
      there will be data to retransmit.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      759af00e
  22. 01 10月, 2008 1 次提交
  23. 18 9月, 2008 1 次提交
  24. 04 8月, 2008 1 次提交
    • H
      sctp: Drop ipfargok in sctp_xmit function · f880374c
      Herbert Xu 提交于
      The ipfragok flag controls whether the packet may be fragmented
      either on the local host on beyond.  The latter is only valid on
      IPv4.
      
      In fact, we never want to do the latter even on IPv4 when PMTU is
      enabled.  This is because even though we can't fragment packets
      within SCTP due to the prtocol's inherent faults, we can still
      fragment it at IP layer.  By setting the DF bit we will improve
      the PMTU process.
      
      RFC 2960 only says that we SHOULD clear the DF bit in this case,
      so we're compliant even if we set the DF bit.  In fact RFC 4960
      no longer has this statement.
      
      Once we make this change, we only need to control the local
      fragmentation.  There is already a bit in the skb which controls
      that, local_df.  So this patch sets that instead of using the
      ipfragok argument.
      
      The only complication is that there isn't a struct sock object
      per transport, so for IPv4 we have to resort to changing the
      pmtudisc field for every packet.  This should be safe though
      as the protocol is single-threaded.
      
      Note that after this patch we can remove ipfragok from the rest
      of the stack too.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f880374c
  25. 19 7月, 2008 1 次提交
  26. 17 7月, 2008 1 次提交
  27. 20 6月, 2008 1 次提交
    • V
      sctp: Follow security requirement of responding with 1 packet · 2e3216cd
      Vlad Yasevich 提交于
      RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
      
      When an SCTP stack receives a packet containing multiple control or
      DATA chunks and the processing of the packet requires the sending of
      multiple chunks in response, the sender of the response chunk(s) MUST
      NOT send more than one packet.  If bundling is supported, multiple
      response chunks that fit into a single packet MAY be bundled together
      into one single response packet.  If bundling is not supported, then
      the sender MUST NOT send more than one response chunk and MUST
      discard all other responses.  Note that this rule does NOT apply to a
      SACK chunk, since a SACK chunk is, in itself, a response to DATA and
      a SACK does not require a response of more DATA.
      
      We implement this by not servicing our outqueue until we reach the end
      of the packet.  This enables maximum bundling.  We also identify
      'response' chunks and make sure that we only send 1 packet when sending
      such chunks.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e3216cd
  28. 05 6月, 2008 1 次提交
  29. 06 3月, 2008 1 次提交
  30. 05 2月, 2008 1 次提交