1. 05 9月, 2009 21 次提交
    • V
      sctp: Failover transmitted list on transport delete · 31b02e15
      Vlad Yasevich 提交于
      Add-IP feature allows users to delete an active transport.  If that
      transport has chunks in flight, those chunks need to be moved to another
      transport or association may get into unrecoverable state.
      Reported-by: NRafael Laufer <rlaufer@cisco.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      31b02e15
    • V
      sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05
      Vlad Yasevich 提交于
      We had a bug that we never stored the user-defined value for
      MAXSEG when setting the value on an association.  Thus future
      PMTU events ended up re-writing the frag point and increasing
      it past user limit.  Additionally, when setting the option on
      the socket/endpoint, we effect all current associations, which
      is against spec.
      
      Now, we store the user 'maxseg' value along with the computed
      'frag_point'.  We inherit 'maxseg' from the socket at association
      creation and use it as an upper limit for 'frag_point' when its
      set.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      f68b2e05
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: Nagle delay should be based on path mtu · b29e7907
      Vlad Yasevich 提交于
      The decision to delay due to Nagle should be based on the path mtu
      and future packet size.  We currently incorrectly base it on
      'frag_point' which is the SCTP DATA segment size, and also we do
      not count DATA chunk header overhead in the computation.  This
      actuall allows situations where a user can set low 'frag_point',
      and then send small messages without delay.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b29e7907
    • V
      sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. · d4d6fb57
      Vlad Yasevich 提交于
      We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
      This results in an hung association if the remote only uses
      SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
      closing.  The reason for that is that we simply honor the a_rwnd
      from the sack, but since we faked it to be 0, we enter 0-window
      probing.  The fix is to use the peers old rwnd and add our flight
      size to it.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d4d6fb57
    • V
      sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6
      Vlad Yasevich 提交于
      SCTP has a problem that when small chunks are used, it is possible
      to exhaust the receiver buffer without fully closing receive window.
      This happens due to all overhead that we have account for with small
      messages.  To fix this, when receive buffer is exceeded, we'll drop
      the window to 0 and save the 'drop' portion.  When application starts
      reading data and freeing up recevie buffer space, we'll wait until
      we've reached the 'drop' window and then add back this 'drop' one
      mtu at a time.  This worked well in testing and under stress produced
      rather even recovery.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4d3c46e6
    • V
      sctp: Clear fast_recovery on the transport when T3 timer expires. · 33ce8281
      Vlad Yasevich 提交于
      If T3 timer expires, we are retransmitting data due to timeout any
      any fast recovery is null and void.  We can clear the fast recovery
      flag.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      33ce8281
    • V
      sctp: Fix error count increments that were results of HEARTBEATS · b9f84786
      Vlad Yasevich 提交于
      SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
      errors agains a given transport or endpoint.  As such, we
      should increment the error counts for only for unacknowledged
      HB, otherwise we detect failure too soon.  This goes for both
      the overall error count and the path error count.
      
      Now, there is a difference in how the detection is done
      between the two.  The path error detection is done after
      the increment, so to detect it properly, we actually need
      to exceed the path threshold.  The overall error detection
      is done _BEFORE_ the increment.  Thus to detect the failure,
      it's enough for the error count to match the threshold.
      This is why all the state functions use '>=' to detect failure,
      while path detection uses '>'.
      
      Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
      proposed patches to fix this issue and made me re-read the spec
      and the code to figure out how this cruft really works.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b9f84786
    • A
      sctp: use proc_create() · d71a09ed
      Alexey Dobriyan 提交于
      create_proc_entry() is deprecated (not formally, though).
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d71a09ed
    • W
      sctp: fix check the chunk length of received HEARTBEAT-ACK chunk · dadb50cc
      Wei Yongjun 提交于
      The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
      that contains the Heartbeat Information field copied from the
      received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
      must have a length of:
        sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)
      
      A badly formatted HB-ACK chunk, it is possible that we may access
      invalid memory.  We should really make sure that the chunk format
      is what we expect, before attempting to touch the data.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      dadb50cc
    • W
      sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN · a2f36eec
      Wei Yongjun 提交于
      If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
      Cumulative TSN Ack Point then drop the SHUTDOWN chunk.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      a2f36eec
    • V
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich 提交于
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
    • V
      sctp: Try to encourage SACK bundling with DATA. · 5d7ff261
      Vlad Yasevich 提交于
      If the association has a SACK timer pending and now DATA queued
      to be send, we'll try to bundle the SACK with the next application send.
      As such, try encourage bundling by accounting for SACK in the size
      of the first chunk fragment.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      5d7ff261
    • V
      sctp: Generate SACKs when actually sending outbound DATA · e83963b7
      Vlad Yasevich 提交于
      We are now trying to bundle SACKs when we have outbound
      DATA to send.  However, there are situations where this
      outbound DATA will not be sent (due to congestion or 
      available window).  In such cases it's ok to wait for the
      timer to expire.  This patch refactors the sending code
      so that betfore attempting to bundle the SACK we check
      to see if the DATA will actually be transmitted.
      
      Based on eirlier works for Doug Graham <dgraham@nortel.com> and
      Wei Youngjun <yjwei@cn.fujitsu.com>.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      e83963b7
    • V
      sctp: Fix data segmentation with small frag_size · 3e62abf9
      Vlad Yasevich 提交于
      Since an application may specify the maximum SCTP fragment size
      that all data should be fragmented to, we need to fix how
      we do segmentation.   Right now, if a user specifies a small
      fragment size, the segment size can go negative in the presence
      of AUTH or COOKIE_ECHO bundling.
      
      What we need to do is track the largest possbile DATA chunk that
      can fit into the mtu.  Then if the fragment size specified is
      bigger then this maximum length, we'll shrink it down.  Otherwise,
      we just use the smaller segment size without changing it further.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      3e62abf9
    • V
      sctp: Disallow new connection on a closing socket · bec9640b
      Vlad Yasevich 提交于
      If a socket has a lot of association that are in the process of
      of being closed/aborted, it is possible for a remote to establish
      new associations during the time period that the old ones are shutting
      down.  If this was a result of a close() call, there will be no socket
      and will cause a memory leak.  We'll prevent this by setting the
      socket state to CLOSING and disallow new associations when in this state.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      bec9640b
    • D
      sctp: Fix piggybacked ACKs · af87b823
      Doug Graham 提交于
      This patch corrects the conditions under which a SACK will be piggybacked
      on a DATA packet.  The previous condition was incorrect due to a
      misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
      following paragraph from section 6.2 had not been implemented correctly:
      
         Before an endpoint transmits a DATA chunk, if any received DATA
         chunks have not been acknowledged (e.g., due to delayed ack), the
         sender should create a SACK and bundle it with the outbound DATA
         chunk, as long as the size of the final SCTP packet does not exceed
         the current MTU.  See Section 6.2.
      
      When about to send a DATA chunk, the code now checks to see if the SACK
      timer is running.  If it is, we know we have a SACK to send to the
      peer, so we append the SACK (assuming available space in the packet)
      and turn off the timer.  For a simple request-response scenario, this
      will result in the SACK being bundled with the response, meaning the
      the SACK is received quickly by the client, and also meaning that no
      separate SACK packet needs to be sent by the server to acknowledge the
      request.  Prior to this patch, a separate SACK packet would have been
      sent by the server SCTP only after its delayed-ACK timer had expired
      (usually 200ms).  This is wasteful of bandwidth, and can also have a
      major negative impact on performance due the interaction of delayed ACKs
      with the Nagle algorithm.
      Signed-off-by: NDoug Graham <dgraham@nortel.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      af87b823
    • V
      sctp: release cached route when the transport goes down. · 40187886
      Vlad Yasevich 提交于
      When the sctp transport is marked down, we can release the
      cached route and force a new lookup when attempting to use
      this transport for anything.  This way, if a better route
      or source address is available, we'll try to use it.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      40187886
    • W
      sctp: update the route for non-active transports after addresses are added · 3cd9749c
      Wei Yongjun 提交于
      Update the route and saddr entries for the non-active transports as some
      of the added addresses can be used as better source addresses, or may
      be there is a better route.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      3cd9749c
    • W
      sctp: check the unrecognized ASCONF parameter before access it · 44e65c1e
      Wei Yongjun 提交于
      This patch fix to check the unrecognized ASCONF parameter before
      access it.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      44e65c1e
    • W
      sctp: avoid overwrite the return value of sctp_process_asconf_ack() · 425e0f68
      Wei Yongjun 提交于
      The return value of sctp_process_asconf_ack() may be
      overwritten while process parameters with no error.
      This patch fixed the problem.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      425e0f68
  2. 10 8月, 2009 1 次提交
    • R
      sctp: fix missing destroy of percpu counter variable in sctp_proc_exit() · 418372b0
      Rafael Laufer 提交于
      Commit 1748376b,
      	net: Use a percpu_counter for sockets_allocated
      
      added percpu_counter function calls to sctp_proc_init code path, but
      forgot to add them to sctp_proc_exit().  This resulted in a following
      Ooops when performing this test
      	# modprobe sctp
      	# rmmod -f sctp
      	# modprobe sctp
      
      [  573.862512] BUG: unable to handle kernel paging request at f8214a24
      [  573.862518] IP: [<c0308b8f>] __percpu_counter_init+0x3f/0x70
      [  573.862530] *pde = 37010067 *pte = 00000000
      [  573.862534] Oops: 0002 [#1] SMP
      [  573.862537] last sysfs file: /sys/module/libcrc32c/initstate
      [  573.862540] Modules linked in: sctp(+) crc32c libcrc32c binfmt_misc bridge
      stp bnep lp snd_hda_codec_analog snd_hda_intel snd_hda_codec snd_hwdep
      snd_pcm_oss snd_mixer_oss arc4 joydev snd_pcm ecb pcmcia snd_seq_dummy
      snd_seq_oss iwlagn iwlcore snd_seq_midi snd_rawmidi snd_seq_midi_event
      yenta_socket rsrc_nonstatic thinkpad_acpi snd_seq snd_timer snd_seq_device
      mac80211 psmouse sdhci_pci sdhci nvidia(P) ppdev video snd soundcore serio_raw
      pcspkr iTCO_wdt iTCO_vendor_support led_class ricoh_mmc pcmcia_core intel_agp
      nvram agpgart usbhid parport_pc parport output snd_page_alloc cfg80211 btusb
      ohci1394 ieee1394 e1000e [last unloaded: sctp]
      [  573.862589]
      [  573.862593] Pid: 5373, comm: modprobe Tainted: P  R        (2.6.31-rc3 #6)
      7663B15
      [  573.862596] EIP: 0060:[<c0308b8f>] EFLAGS: 00010286 CPU: 1
      [  573.862599] EIP is at __percpu_counter_init+0x3f/0x70
      [  573.862602] EAX: f8214a20 EBX: f80faa14 ECX: c48c0000 EDX: f80faa20
      [  573.862604] ESI: f80a7000 EDI: 00000000 EBP: f69d5ef0 ESP: f69d5eec
      [  573.862606]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
      [  573.862610] Process modprobe (pid: 5373, ti=f69d4000 task=c2130c70
      task.ti=f69d4000)
      [  573.862612] Stack:
      [  573.862613]  00000000 f69d5f18 f80a70a8 f80fa9fc 00000000 fffffffc f69d5f30
      c018e2d4
      [  573.862619] <0> 00000000 f80a7000 00000000 f69d5f88 c010112b 00000000
      c07029c0 fffffffb
      [  573.862626] <0> 00000000 f69d5f38 c018f83f f69d5f54 c0557cad f80fa860
      00000001 c07010c0
      [  573.862634] Call Trace:
      [  573.862644]  [<f80a70a8>] ? sctp_init+0xa8/0x7d4 [sctp]
      [  573.862650]  [<c018e2d4>] ? marker_update_probe_range+0x184/0x260
      [  573.862659]  [<f80a7000>] ? sctp_init+0x0/0x7d4 [sctp]
      [  573.862662]  [<c010112b>] ? do_one_initcall+0x2b/0x160
      [  573.862666]  [<c018f83f>] ? tracepoint_module_notify+0x2f/0x40
      [  573.862671]  [<c0557cad>] ? notifier_call_chain+0x2d/0x70
      [  573.862678]  [<c01588fd>] ? __blocking_notifier_call_chain+0x4d/0x60
      [  573.862682]  [<c016b2f1>] ? sys_init_module+0xb1/0x1f0
      [  573.862686]  [<c0102ffc>] ? sysenter_do_call+0x12/0x28
      [  573.862688] Code: 89 48 08 b8 04 00 00 00 e8 df aa ec ff ba f4 ff ff ff 85
      c0 89 43 14 74 31 b8 b0 18 71 c0 e8 19 b9 24 00 a1 c4 18 71 c0 8d 53 0c <89> 50
      04 89 43 0c b8 b0 18 71 c0 c7 43 10 c4 18 71 c0 89 15 c4
      [  573.862725] EIP: [<c0308b8f>] __percpu_counter_init+0x3f/0x70 SS:ESP
      0068:f69d5eec
      [  573.862730] CR2: 00000000f8214a24
      [  573.862734] ---[ end trace 39c4e0b55e7cf54d ]---
      Signed-off-by: NRafael Laufer <rlaufer@cisco.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      418372b0
  3. 06 8月, 2009 1 次提交
  4. 07 7月, 2009 1 次提交
    • W
      sctp: fix warning at inet_sock_destruct() while release sctp socket · 1bc4ee40
      Wei Yongjun 提交于
      Commit 'net: Move rx skb_orphan call to where needed' broken sctp protocol
      with warning at inet_sock_destruct(). Actually, sctp can do this right with
      sctp_sock_rfree_frag() and sctp_skb_set_owner_r_frag() pair.
      
          sctp_sock_rfree_frag(skb);
          sctp_skb_set_owner_r_frag(skb, newsk);
      
      This patch not revert the commit d55d87fd,
      instead remove the sctp_sock_rfree_frag() function.
      
      ------------[ cut here ]------------
      WARNING: at net/ipv4/af_inet.c:151 inet_sock_destruct+0xe0/0x142()
      Modules linked in: sctp ipv6 dm_mirror dm_region_hash dm_log dm_multipath
      scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: scsi_wait_scan]
      Pid: 1808, comm: sctp_test Not tainted 2.6.31-rc2 #40
      Call Trace:
       [<c042dd06>] warn_slowpath_common+0x6a/0x81
       [<c064a39a>] ? inet_sock_destruct+0xe0/0x142
       [<c042dd2f>] warn_slowpath_null+0x12/0x15
       [<c064a39a>] inet_sock_destruct+0xe0/0x142
       [<c05fde44>] __sk_free+0x19/0xcc
       [<c05fdf50>] sk_free+0x18/0x1a
       [<ca0d14ad>] sctp_close+0x192/0x1a1 [sctp]
       [<c0649f7f>] inet_release+0x47/0x4d
       [<c05fba4d>] sock_release+0x19/0x5e
       [<c05fbab3>] sock_close+0x21/0x25
       [<c049c31b>] __fput+0xde/0x189
       [<c049c3de>] fput+0x18/0x1a
       [<c049988f>] filp_close+0x56/0x60
       [<c042f422>] put_files_struct+0x5d/0xa1
       [<c042f49f>] exit_files+0x39/0x3d
       [<c043086a>] do_exit+0x1a5/0x5dd
       [<c04a86c2>] ? d_kill+0x35/0x3b
       [<c0438fa4>] ? dequeue_signal+0xa6/0x115
       [<c0430d05>] do_group_exit+0x63/0x8a
       [<c0439504>] get_signal_to_deliver+0x2e1/0x2f9
       [<c0401d9e>] do_notify_resume+0x7c/0x6b5
       [<c043f601>] ? autoremove_wake_function+0x0/0x34
       [<c04a864e>] ? __d_free+0x3d/0x40
       [<c04a867b>] ? d_free+0x2a/0x3c
       [<c049ba7e>] ? vfs_write+0x103/0x117
       [<c05fc8fa>] ? sys_socketcall+0x178/0x182
       [<c0402a56>] work_notifysig+0x13/0x19
      ---[ end trace 9db92c463e789fba ]---
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bc4ee40
  5. 30 6月, 2009 1 次提交
  6. 23 6月, 2009 1 次提交
  7. 18 6月, 2009 1 次提交
  8. 10 6月, 2009 1 次提交
  9. 09 6月, 2009 1 次提交
  10. 03 6月, 2009 11 次提交