1. 29 11月, 2012 1 次提交
  2. 15 8月, 2012 2 次提交
  3. 23 7月, 2012 1 次提交
    • N
      sctp: Implement quick failover draft from tsvwg · 5aa93bcf
      Neil Horman 提交于
      I've seen several attempts recently made to do quick failover of sctp transports
      by reducing various retransmit timers and counters.  While its possible to
      implement a faster failover on multihomed sctp associations, its not
      particularly robust, in that it can lead to unneeded retransmits, as well as
      false connection failures due to intermittent latency on a network.
      
      Instead, lets implement the new ietf quick failover draft found here:
      http://tools.ietf.org/html/draft-nishida-tsvwg-sctp-failover-05
      
      This will let the sctp stack identify transports that have had a small number of
      errors, and avoid using them quickly until their reliability can be
      re-established.  I've tested this out on two virt guests connected via multiple
      isolated virt networks and believe its in compliance with the above draft and
      works well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: Sridhar Samudrala <sri@us.ibm.com>
      CC: "David S. Miller" <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      CC: joe@perches.com
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa93bcf
  4. 21 7月, 2012 1 次提交
  5. 17 7月, 2012 1 次提交
    • D
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller 提交于
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6700c270
  6. 16 7月, 2012 1 次提交
  7. 01 7月, 2012 1 次提交
    • N
      sctp: be more restrictive in transport selection on bundled sacks · 4244854d
      Neil Horman 提交于
      It was noticed recently that when we send data on a transport, its possible that
      we might bundle a sack that arrived on a different transport.  While this isn't
      a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
      2960:
      
       An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
         etc.) to the same destination transport address from which it
         received the DATA or control chunk to which it is replying.  This
         rule should also be followed if the endpoint is bundling DATA chunks
         together with the reply chunk.
      
      This patch seeks to correct that.  It restricts the bundling of sack operations
      to only those transports which have moved the ctsn of the association forward
      since the last sack.  By doing this we guarantee that we only bundle outbound
      saks on a transport that has received a chunk since the last sack.  This brings
      us into stricter compliance with the RFC.
      
      Vlad had initially suggested that we strictly allow only sack bundling on the
      transport that last moved the ctsn forward.  While this makes sense, I was
      concerned that doing so prevented us from bundling in the case where we had
      received chunks that moved the ctsn on multiple transports.  In those cases, the
      RFC allows us to select any of the transports having received chunks to bundle
      the sack on.  so I've modified the approach to allow for that, by adding a state
      variable to each transport that tracks weather it has moved the ctsn since the
      last sack.  This I think keeps our behavior (and performance), close enough to
      our current profile that I think we can do this without a sysctl knob to
      enable/disable it.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Vlad Yaseivch <vyasevich@gmail.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-sctp@vger.kernel.org
      Reported-by: NMichele Baldessari <michele@redhat.com>
      Reported-by: Nsorin serban <sserban@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4244854d
  8. 11 5月, 2012 1 次提交
  9. 09 11月, 2011 1 次提交
  10. 09 5月, 2011 1 次提交
  11. 28 4月, 2011 3 次提交
  12. 27 8月, 2010 1 次提交
  13. 16 5月, 2010 1 次提交
  14. 06 5月, 2010 1 次提交
    • V
      sctp: Fix a race between ICMP protocol unreachable and connect() · 50b5d6ad
      Vlad Yasevich 提交于
      ICMP protocol unreachable handling completely disregarded
      the fact that the user may have locked the socket.  It proceeded
      to destroy the association, even though the user may have
      held the lock and had a ref on the association.  This resulted
      in the following:
      
      Attempt to release alive inet socket f6afcc00
      
      =========================
      [ BUG: held lock freed! ]
      -------------------------
      somenu/2672 is freeing memory f6afcc00-f6afcfff, with a lock still held
      there!
       (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      1 lock held by somenu/2672:
       #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c122098a>] sctp_connect+0x13/0x4c
      
      stack backtrace:
      Pid: 2672, comm: somenu Not tainted 2.6.32-telco #55
      Call Trace:
       [<c1232266>] ? printk+0xf/0x11
       [<c1038553>] debug_check_no_locks_freed+0xce/0xff
       [<c10620b4>] kmem_cache_free+0x21/0x66
       [<c1185f25>] __sk_free+0x9d/0xab
       [<c1185f9c>] sk_free+0x1c/0x1e
       [<c1216e38>] sctp_association_put+0x32/0x89
       [<c1220865>] __sctp_connect+0x36d/0x3f4
       [<c122098a>] ? sctp_connect+0x13/0x4c
       [<c102d073>] ? autoremove_wake_function+0x0/0x33
       [<c12209a8>] sctp_connect+0x31/0x4c
       [<c11d1e80>] inet_dgram_connect+0x4b/0x55
       [<c11834fa>] sys_connect+0x54/0x71
       [<c103a3a2>] ? lock_release_non_nested+0x88/0x239
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c1054026>] ? might_fault+0x42/0x7c
       [<c11847ab>] sys_socketcall+0x6d/0x178
       [<c10da994>] ? trace_hardirqs_on_thunk+0xc/0x10
       [<c1002959>] syscall_call+0x7/0xb
      
      This was because the sctp_wait_for_connect() would aqcure the socket
      lock and then proceed to release the last reference count on the
      association, thus cause the fully destruction path to finish freeing
      the socket.
      
      The simplest solution is to start a very short timer in case the socket
      is owned by user.  When the timer expires, we can do some verification
      and be able to do the release properly.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50b5d6ad
  15. 01 5月, 2010 3 次提交
  16. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  17. 29 11月, 2009 1 次提交
    • A
      sctp: on T3_RTX retransmit all the in-flight chunks · 5fdd4bae
      Andrei Pelinescu-Onciul 提交于
      When retransmitting due to T3 timeout, retransmit all the
      in-flight chunks for the corresponding  transport/path, including
      chunks sent less then 1 rto ago.
      This is the correct behaviour according to rfc4960 section 6.3.3
      E3 and
      "Note: Any DATA chunks that were sent to the address for which the
       T3-rtx timer expired but did not fit in one MTU (rule E3 above)
       should be marked for retransmission and sent as soon as cwnd
       allows (normally, when a SACK arrives). ".
      
      This fixes problems when more then one path is present and the T3
      retransmission of the first chunk that timeouts stops the T3 timer
      for the initial active path, leaving all the other in-flight
      chunks waiting forever or until a new chunk is transmitted on the
      same path and timeouts (and this will happen only if the cwnd
      allows sending new chunks, but since cwnd was dropped to MTU by
      the timeout => it will wait until the first heartbeat).
      
      Example: 10 packets in flight, sent at 0.1 s intervals on the
      primary path. The primary path is down and the first packet
      timeouts. The first packet is retransmitted on another path, the
      T3 timer for the primary path is stopped and cwnd is set to MTU.
      All the other 9 in-flight packets will not be retransmitted
      (unless more new packets are sent on the primary path which depend
      on cwnd allowing it, and even in this case the 9 packets will be
      retransmitted only after a new packet timeouts which even in the
      best case would be more then RTO).
      
      This commit reverts d0ce9291 and
      also removes the now unused transport->last_rto, introduced in
       b6157d8e.
      
      p.s  The problem is not only when multiple paths are there.  It
      can happen in a single homed environment.  If the application
      stops sending data, it possible to have a hung association.
      Signed-off-by: NAndrei Pelinescu-Onciul <andrei@iptel.org>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fdd4bae
  18. 24 11月, 2009 2 次提交
    • V
      sctp: Update max.burst implementation · 46d5a808
      Vlad Yasevich 提交于
      Current implementation of max.burst ends up limiting new
      data during cwnd decay period.  The decay is happening becuase
      the connection is idle and we are allowed to fill the congestion
      window.  The point of max.burst is to limit micro-bursts in response
      to large acks.  This still happens, as max.burst is still applied
      to each transmit opportunity.  It will also apply if a very large
      send is made (greater then allowed by burst).
      Tested-by: NFlorian Niederbacher <florian.niederbacher@student.uibk.ac.at>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      46d5a808
    • V
      sctp: Remove useless last_time_used variable · 245cba7e
      Vlad Yasevich 提交于
      The transport last_time_used variable is rather useless.
      It was only used when determining if CWND needs to be updated
      due to idle transport.  However, idle transport detection was
      based on a Heartbeat timer and last_time_used was not incremented
      when sending Heartbeats.  As a result the check for cwnd reduction
      was always true.  We can get rid of the variable and just base
      our cwnd manipulation on the HB timer (like the code comment sais).
      We also have to call into the cwnd manipulation function regardless
      of whether HBs are enabled or not.  That way we will detect idle
      transports if the user has disabled Heartbeats.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      245cba7e
  19. 14 11月, 2009 1 次提交
  20. 05 9月, 2009 1 次提交
  21. 03 3月, 2009 1 次提交
  22. 16 2月, 2009 1 次提交
    • V
      sctp: Fix the RTO-doubling on idle-link heartbeats · faee47cd
      Vlad Yasevich 提交于
      SCTP incorrectly doubles rto ever time a Hearbeat chunk
      is generated.   However RFC 4960 states:
      
         On an idle destination address that is allowed to heartbeat, it is
         recommended that a HEARTBEAT chunk is sent once per RTO of that
         destination address plus the protocol parameter 'HB.interval', with
         jittering of +/- 50% of the RTO value, and exponential backoff of the
         RTO if the previous HEARTBEAT is unanswered.
      
      Essentially, of if the heartbean is unacknowledged, do we double the RTO.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faee47cd
  23. 19 7月, 2008 1 次提交
    • F
      sctp: Prevent uninitialized memory access · 6d0ccbac
      Florian Westphal 提交于
      valgrind reports uninizialized memory accesses when running
      sctp inside the network simulation cradle simulator:
      
       Conditional jump or move depends on uninitialised value(s)
          at 0x570E34A: sctp_assoc_sync_pmtu (associola.c:1324)
          by 0x57427DA: sctp_packet_transmit (output.c:403)
          by 0x5710EFF: sctp_outq_flush (outqueue.c:824)
          by 0x5710B88: sctp_outq_uncork (outqueue.c:701)
          by 0x5745262: sctp_cmd_interpreter (sm_sideeffect.c:1548)
          by 0x57444B7: sctp_side_effects (sm_sideeffect.c:976)
          by 0x5744460: sctp_do_sm (sm_sideeffect.c:945)
          by 0x572157D: sctp_primitive_ASSOCIATE (primitive.c:94)
          by 0x5725C04: __sctp_connect (socket.c:1094)
          by 0x57297DC: sctp_connect (socket.c:3297)
      
       Conditional jump or move depends on uninitialised value(s)
          at 0x575D3A5: mod_timer (timer.c:630)
          by 0x5752B78: sctp_cmd_hb_timers_start (sm_sideeffect.c:555)
          by 0x5754133: sctp_cmd_interpreter (sm_sideeffect.c:1448)
          by 0x5753607: sctp_side_effects (sm_sideeffect.c:976)
          by 0x57535B0: sctp_do_sm (sm_sideeffect.c:945)
          by 0x571E9AE: sctp_endpoint_bh_rcv (endpointola.c:474)
          by 0x573347F: sctp_inq_push (inqueue.c:104)
          by 0x572EF93: sctp_rcv (input.c:256)
          by 0x5689623: ip_local_deliver_finish (ip_input.c:230)
          by 0x5689759: ip_local_deliver (ip_input.c:268)
          by 0x5689CAC: ip_rcv_finish (dst.h:246)
      
      #1 is due to "if (t->pmtu_pending)".
      8a479491 "[SCTP] Flag a pmtu change request"
      suggests it should be initialized to 0.
      
      #2 is the heartbeat timer 'expires' value, which is uninizialised, but
      test by mod_timer().
      T3_rtx_timer seems to be affected by the same problem, so initialize it, too.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d0ccbac
  24. 05 6月, 2008 3 次提交
  25. 06 3月, 2008 1 次提交
  26. 05 2月, 2008 1 次提交
  27. 29 1月, 2008 1 次提交
  28. 08 11月, 2007 1 次提交
    • V
      SCTP: Fix difference cases of retransmit. · b6157d8e
      Vlad Yasevich 提交于
      Commit d0ce9291 broke several retransmit
      cases including fast retransmit.  The reason is that we should
      only delay by rto while doing retranmists as a result of a timeout.
      Retransmit as a result of path mtu discover, fast retransmit, or
      other evernts that should trigger immidiate retransmissions got broken.
      
      Also, since rto is doubled prior to marking of packets elegable for
      retransmission, we never marked correct chunks anyway.
      
      The fix is provide a reason for a given retransmission so that we
      can mark chunks appropriately and to save the old rto value to do
      comparisons against.
      
      All regressions tests passed with this code.
      
      Spotted by Wei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b6157d8e
  29. 14 6月, 2007 2 次提交
  30. 26 4月, 2007 1 次提交
  31. 23 3月, 2007 1 次提交