1. 29 4月, 2010 1 次提交
    • N
      sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4) · 5fa782c2
      Neil Horman 提交于
      Ok, version 4
      
      Change Notes:
      1) Minor cleanups, from Vlads notes
      
      Summary:
      
      Hey-
      	Recently, it was reported to me that the kernel could oops in the
      following way:
      
      <5> kernel BUG at net/core/skbuff.c:91!
      <5> invalid operand: 0000 [#1]
      <5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
      ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
      vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
      ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
      snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
      pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
      mptbase sd_mod scsi_mod
      <5> CPU:    0
      <5> EIP:    0060:[<c02bff27>]    Not tainted VLI
      <5> EFLAGS: 00010216   (2.6.9-89.0.25.EL)
      <5> EIP is at skb_over_panic+0x1f/0x2d
      <5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
      <5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
      <5> ds: 007b   es: 007b   ss: 0068
      <5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
      <5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
      e0c2947d
      <5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
      df653490
      <5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
      00000004
      <5> Call Trace:
      <5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
      <5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
      <5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
      <5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
      <5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
      <5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
      <5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
      <5>  [<c01555a4>] cache_grow+0x140/0x233
      <5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
      <5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
      <5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
      <5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
      <5>  [<c02d005e>] nf_iterate+0x40/0x81
      <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
      <5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
      <5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
      <5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
      <5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
      <5>  [<c02e103e>] ip_rcv+0x334/0x3b4
      <5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
      <5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
      <5>  [<c02c67a4>] process_backlog+0x6c/0xd9
      <5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
      <5>  [<c012a7b1>] __do_softirq+0x35/0x79
      <5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
      <5>  [<c01094de>] do_softirq+0x46/0x4d
      
      Its an skb_over_panic BUG halt that results from processing an init chunk in
      which too many of its variable length parameters are in some way malformed.
      
      The problem is in sctp_process_unk_param:
      if (NULL == *errp)
      	*errp = sctp_make_op_error_space(asoc, chunk,
      					 ntohs(chunk->chunk_hdr->length));
      
      	if (*errp) {
      		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
      				 WORD_ROUND(ntohs(param.p->length)));
      		sctp_addto_chunk(*errp,
      			WORD_ROUND(ntohs(param.p->length)),
      				  param.v);
      
      When we allocate an error chunk, we assume that the worst case scenario requires
      that we have chunk_hdr->length data allocated, which would be correct nominally,
      given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
      we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
      chunk, so the worst case situation in which all parameters are in violation
      requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.
      
      The result of this error is that a deliberately malformed packet sent to a
      listening host can cause a remote DOS, described in CVE-2010-1173:
      http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173
      
      I've tested the below fix and confirmed that it fixes the issue.  We move to a
      strategy whereby we allocate a fixed size error chunk and ignore errors we don't
      have space to report.  Tested by me successfully
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fa782c2
  2. 04 12月, 2009 1 次提交
  3. 29 11月, 2009 1 次提交
    • A
      sctp: on T3_RTX retransmit all the in-flight chunks · 5fdd4bae
      Andrei Pelinescu-Onciul 提交于
      When retransmitting due to T3 timeout, retransmit all the
      in-flight chunks for the corresponding  transport/path, including
      chunks sent less then 1 rto ago.
      This is the correct behaviour according to rfc4960 section 6.3.3
      E3 and
      "Note: Any DATA chunks that were sent to the address for which the
       T3-rtx timer expired but did not fit in one MTU (rule E3 above)
       should be marked for retransmission and sent as soon as cwnd
       allows (normally, when a SACK arrives). ".
      
      This fixes problems when more then one path is present and the T3
      retransmission of the first chunk that timeouts stops the T3 timer
      for the initial active path, leaving all the other in-flight
      chunks waiting forever or until a new chunk is transmitted on the
      same path and timeouts (and this will happen only if the cwnd
      allows sending new chunks, but since cwnd was dropped to MTU by
      the timeout => it will wait until the first heartbeat).
      
      Example: 10 packets in flight, sent at 0.1 s intervals on the
      primary path. The primary path is down and the first packet
      timeouts. The first packet is retransmitted on another path, the
      T3 timer for the primary path is stopped and cwnd is set to MTU.
      All the other 9 in-flight packets will not be retransmitted
      (unless more new packets are sent on the primary path which depend
      on cwnd allowing it, and even in this case the 9 packets will be
      retransmitted only after a new packet timeouts which even in the
      best case would be more then RTO).
      
      This commit reverts d0ce9291 and
      also removes the now unused transport->last_rto, introduced in
       b6157d8e.
      
      p.s  The problem is not only when multiple paths are there.  It
      can happen in a single homed environment.  If the application
      stops sending data, it possible to have a hung association.
      Signed-off-by: NAndrei Pelinescu-Onciul <andrei@iptel.org>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5fdd4bae
  4. 24 11月, 2009 3 次提交
    • V
      sctp: Update max.burst implementation · 46d5a808
      Vlad Yasevich 提交于
      Current implementation of max.burst ends up limiting new
      data during cwnd decay period.  The decay is happening becuase
      the connection is idle and we are allowed to fill the congestion
      window.  The point of max.burst is to limit micro-bursts in response
      to large acks.  This still happens, as max.burst is still applied
      to each transmit opportunity.  It will also apply if a very large
      send is made (greater then allowed by burst).
      Tested-by: NFlorian Niederbacher <florian.niederbacher@student.uibk.ac.at>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      46d5a808
    • V
      sctp: Remove useless last_time_used variable · 245cba7e
      Vlad Yasevich 提交于
      The transport last_time_used variable is rather useless.
      It was only used when determining if CWND needs to be updated
      due to idle transport.  However, idle transport detection was
      based on a Heartbeat timer and last_time_used was not incremented
      when sending Heartbeats.  As a result the check for cwnd reduction
      was always true.  We can get rid of the variable and just base
      our cwnd manipulation on the HB timer (like the code comment sais).
      We also have to call into the cwnd manipulation function regardless
      of whether HBs are enabled or not.  That way we will detect idle
      transports if the user has disabled Heartbeats.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      245cba7e
    • V
      sctp: Update SWS avaoidance receiver side algorithm · 90f2f531
      Vlad Yasevich 提交于
      We currently send window update SACKs every time we free up 1 PMTU
      worth of data.  That a lot more SACKs then necessary.  Instead, we'll
      now send back the actuall window every time we send a sack, and do
      window-update SACKs when a fraction of the receive buffer has been
      opened.  The fraction is controlled with a sysctl.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      90f2f531
  5. 14 11月, 2009 1 次提交
    • V
      sctp: Set source addresses on the association before adding transports · 409b95af
      Vlad Yasevich 提交于
      Recent commit 8da645e1
      	sctp: Get rid of an extra routing lookup when adding a transport
      introduced a regression in the connection setup.  The behavior was
      
      different between IPv4 and IPv6.  IPv4 case ended up working because the
      route lookup routing returned a NULL route, which triggered another
      route lookup later in the output patch that succeeded.  In the IPv6 case,
      a valid route was returned for first call, but we could not find a valid
      source address at the time since the source addresses were not set on the
      association yet.  Thus resulted in a hung connection.
      
      The solution is to set the source addresses on the association prior to
      adding peers.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      409b95af
  6. 01 10月, 2009 1 次提交
  7. 05 9月, 2009 7 次提交
    • W
      sctp: turn flags in 'struct sctp_association' into bit fields · 9237ccbc
      Wei Yongjun 提交于
      This shrinks the size of struct sctp_association a little.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9237ccbc
    • B
      sctp: Sysctl configuration for IPv4 Address Scoping · 72388433
      Bhaskar Dutta 提交于
      This patch introduces a new sysctl option to make IPv4 Address Scoping
      configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
      
      In networking environments where DNAT rules in iptables prerouting
      chains convert destination IP's to link-local/private IP addresses,
      SCTP connections fail to establish as the INIT chunk is dropped by the
      kernel due to address scope match failure.
      For example to support overlapping IP addresses (same IP address with
      different vlan id) a Layer-5 application listens on link local IP's,
      and there is a DNAT rule that maps the destination IP to a link local
      IP. Such applications never get the SCTP INIT if the address-scoping
      draft is strictly followed.
      
      This sysctl configuration allows SCTP to function in such
      unconventional networking environments.
      
      Sysctl options:
      0 - Disable IPv4 address scoping draft altogether
      1 - Enable IPv4 address scoping (default, current behavior)
      2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
      3 - Enable address scoping but allow IPv4 link local address in init/init-ack
      Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      72388433
    • V
      sctp: Turn flags in 'sctp_packet' into bit fields · a803c942
      Vlad Yasevich 提交于
      This shrinks the size of sctp_packet a little.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      a803c942
    • V
      sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05
      Vlad Yasevich 提交于
      We had a bug that we never stored the user-defined value for
      MAXSEG when setting the value on an association.  Thus future
      PMTU events ended up re-writing the frag point and increasing
      it past user limit.  Additionally, when setting the option on
      the socket/endpoint, we effect all current associations, which
      is against spec.
      
      Now, we store the user 'maxseg' value along with the computed
      'frag_point'.  We inherit 'maxseg' from the socket at association
      creation and use it as an upper limit for 'frag_point' when its
      set.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      f68b2e05
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6
      Vlad Yasevich 提交于
      SCTP has a problem that when small chunks are used, it is possible
      to exhaust the receiver buffer without fully closing receive window.
      This happens due to all overhead that we have account for with small
      messages.  To fix this, when receive buffer is exceeded, we'll drop
      the window to 0 and save the 'drop' portion.  When application starts
      reading data and freeing up recevie buffer space, we'll wait until
      we've reached the 'drop' window and then add back this 'drop' one
      mtu at a time.  This worked well in testing and under stress produced
      rather even recovery.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4d3c46e6
    • V
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich 提交于
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
  8. 03 6月, 2009 1 次提交
    • W
      sctp: fix to choose alternate destination when retransmit ASCONF chunk · 9919b455
      Wei Yongjun 提交于
      RFC 5061 Section 5.1 ASCONF Chunk Procedures said:
      
      B4)  Re-transmit the ASCONF Chunk last sent and if possible choose an
           alternate destination address (please refer to [RFC4960],
           Section 6.4.1).  An endpoint MUST NOT add new parameters to this
           chunk; it MUST be the same (including its Sequence Number) as
           the last ASCONF sent.  An endpoint MAY, however, bundle an
           additional ASCONF with new ASCONF parameters with the next
           Sequence Number.  For details, see Section 5.5.
      
      This patch fix to choose an alternate destination address when
      re-transmit the ASCONF chunk, with some dup codes cleanup.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9919b455
  9. 16 2月, 2009 2 次提交
    • V
      sctp: Fix the RTO-doubling on idle-link heartbeats · faee47cd
      Vlad Yasevich 提交于
      SCTP incorrectly doubles rto ever time a Hearbeat chunk
      is generated.   However RFC 4960 states:
      
         On an idle destination address that is allowed to heartbeat, it is
         recommended that a HEARTBEAT chunk is sent once per RTO of that
         destination address plus the protocol parameter 'HB.interval', with
         jittering of +/- 50% of the RTO value, and exponential backoff of the
         RTO if the previous HEARTBEAT is unanswered.
      
      Essentially, of if the heartbean is unacknowledged, do we double the RTO.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      faee47cd
    • L
      sctp: Allow to disable SCTP checksums via module parameter · 06e86806
      Lucas Nussbaum 提交于
      This is a new version of my patch, now using a module parameter instead
      of a sysctl, so that the option is harder to find. Please note that,
      once the module is loaded, it is still possible to change the value of
      the parameter in /sys/module/sctp/parameters/, which is useful if you
      want to do performance comparisons without rebooting.
      
      Computation of SCTP checksums significantly affects the performance of
      SCTP. For example, using two dual-Opteron 246 connected using a Gbe
      network, it was not possible to achieve more than ~730 Mbps, compared to
      941 Mbps after disabling SCTP checksums.
      Unfortunately, SCTP checksum offloading in NICs is not commonly
      available (yet).
      
      By default, checksums are still enabled, of course.
      Signed-off-by: NLucas Nussbaum <lucas.nussbaum@ens-lyon.fr>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06e86806
  10. 09 10月, 2008 1 次提交
    • V
      sctp: Rework the tsn map to use generic bitmap. · 8e1ee18c
      Vlad Yasevich 提交于
      The tsn map currently use is 4K large and is stuck inside
      the sctp_association structure making memory references REALLY
      expensive.  What we really need is at most 4K worth of bits
      so the biggest map we would have is 512 bytes.   Also, the
      map is only really usefull when we have gaps to store and
      report.  As such, starting with minimal map of say 32 TSNs (bits)
      should be enough for normal low-loss operations.  We can grow
      the map by some multiple of 32 along with some extra room any
      time we receive the TSN which would put us outside of the map
      boundry.  As we close gaps, we can shift the map to rebase
      it on the latest TSN we've seen.  This saves 4088 bytes per
      association just in the map alone along savings from the now
      unnecessary structure members.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e1ee18c
  11. 01 10月, 2008 2 次提交
  12. 04 8月, 2008 1 次提交
    • H
      sctp: Drop ipfargok in sctp_xmit function · f880374c
      Herbert Xu 提交于
      The ipfragok flag controls whether the packet may be fragmented
      either on the local host on beyond.  The latter is only valid on
      IPv4.
      
      In fact, we never want to do the latter even on IPv4 when PMTU is
      enabled.  This is because even though we can't fragment packets
      within SCTP due to the prtocol's inherent faults, we can still
      fragment it at IP layer.  By setting the DF bit we will improve
      the PMTU process.
      
      RFC 2960 only says that we SHOULD clear the DF bit in this case,
      so we're compliant even if we set the DF bit.  In fact RFC 4960
      no longer has this statement.
      
      Once we make this change, we only need to control the local
      fragmentation.  There is already a bit in the skb which controls
      that, local_df.  So this patch sets that instead of using the
      ipfragok argument.
      
      The only complication is that there isn't a struct sock object
      per transport, so for IPv4 we have to resort to changing the
      pmtudisc field for every packet.  This should be safe though
      as the protocol is single-threaded.
      
      Note that after this patch we can remove ipfragok from the rest
      of the stack too.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f880374c
  13. 23 7月, 2008 1 次提交
  14. 19 7月, 2008 1 次提交
  15. 20 6月, 2008 1 次提交
    • V
      sctp: Follow security requirement of responding with 1 packet · 2e3216cd
      Vlad Yasevich 提交于
      RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts
      
      When an SCTP stack receives a packet containing multiple control or
      DATA chunks and the processing of the packet requires the sending of
      multiple chunks in response, the sender of the response chunk(s) MUST
      NOT send more than one packet.  If bundling is supported, multiple
      response chunks that fit into a single packet MAY be bundled together
      into one single response packet.  If bundling is not supported, then
      the sender MUST NOT send more than one response chunk and MUST
      discard all other responses.  Note that this rule does NOT apply to a
      SACK chunk, since a SACK chunk is, in itself, a response to DATA and
      a SACK does not require a response of more DATA.
      
      We implement this by not servicing our outqueue until we reach the end
      of the packet.  This enables maximum bundling.  We also identify
      'response' chunks and make sure that we only send 1 packet when sending
      such chunks.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e3216cd
  16. 05 6月, 2008 4 次提交
  17. 10 5月, 2008 1 次提交
  18. 24 3月, 2008 1 次提交
  19. 01 3月, 2008 1 次提交
  20. 05 2月, 2008 1 次提交
  21. 29 1月, 2008 5 次提交
  22. 21 12月, 2007 1 次提交
  23. 07 12月, 2007 1 次提交