1. 09 9月, 2009 7 次提交
  2. 06 9月, 2009 6 次提交
  3. 05 9月, 2009 27 次提交
    • P
      net_sched: fix class grafting errno codes · c9f1d038
      Patrick McHardy 提交于
      If the parent qdisc doesn't support classes, use EOPNOTSUPP.
      If the parent class doesn't exist, use ENOENT. Currently EINVAL
      is returned in both cases.
      
      Additionally check whether grafting is supported and remove a now
      unnecessary graft function from sch_ingress.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9f1d038
    • B
      netlink: silence compiler warning · b1f57195
      Brian Haley 提交于
        CC      net/netlink/genetlink.o
      net/netlink/genetlink.c: In function ‘genl_register_mc_group’:
      net/netlink/genetlink.c:139: warning: ‘err’ may be used uninitialized in this function
      
      From following the code 'err' is initialized, but set it to zero to
      silence the warning.
      Signed-off-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1f57195
    • V
      sctp: Catch bogus stream sequence numbers · f1751c57
      Vlad Yasevich 提交于
      Since our TSN map is capable of holding at most a 4K chunk gap,
      there is no way that during this gap, a stream sequence number
      (unsigned short) can wrap such that the new number is smaller
      then the next expected one.  If such a case is encountered,
      this is a protocol violation.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      f1751c57
    • W
      sctp: remove dup code in net/sctp/output.c · be297143
      Wei Yongjun 提交于
      Use sctp_packet_reset() instead of dup code.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      be297143
    • W
      sctp: turn flags in 'struct sctp_association' into bit fields · 9237ccbc
      Wei Yongjun 提交于
      This shrinks the size of struct sctp_association a little.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9237ccbc
    • B
      sctp: Sysctl configuration for IPv4 Address Scoping · 72388433
      Bhaskar Dutta 提交于
      This patch introduces a new sysctl option to make IPv4 Address Scoping
      configurable <draft-stewart-tsvwg-sctp-ipv4-00.txt>.
      
      In networking environments where DNAT rules in iptables prerouting
      chains convert destination IP's to link-local/private IP addresses,
      SCTP connections fail to establish as the INIT chunk is dropped by the
      kernel due to address scope match failure.
      For example to support overlapping IP addresses (same IP address with
      different vlan id) a Layer-5 application listens on link local IP's,
      and there is a DNAT rule that maps the destination IP to a link local
      IP. Such applications never get the SCTP INIT if the address-scoping
      draft is strictly followed.
      
      This sysctl configuration allows SCTP to function in such
      unconventional networking environments.
      
      Sysctl options:
      0 - Disable IPv4 address scoping draft altogether
      1 - Enable IPv4 address scoping (default, current behavior)
      2 - Enable address scoping but allow IPv4 private addresses in init/init-ack
      3 - Enable address scoping but allow IPv4 link local address in init/init-ack
      Signed-off-by: NBhaskar Dutta <bhaskar.dutta@globallogic.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      72388433
    • V
      sctp: Get rid of an extra routing lookup when adding a transport. · 8da645e1
      Vlad Yasevich 提交于
      We used to perform 2 routing lookups for a new transport: one
      just for path mtu detection, and one to actually route to destination
      and path mtu update when sending a packet.  There is no point in doing
      both of them, especially since the first one just for path mtu doesn't
      take into account source address and sometimes gives the wrong route,
      causing path mtu updates anyway.
      
      We now do just the one call to do both route to destination and get
      path mtu updates.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      8da645e1
    • V
      sctp: Turn flags in 'sctp_packet' into bit fields · a803c942
      Vlad Yasevich 提交于
      This shrinks the size of sctp_packet a little.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      a803c942
    • V
      sctp: Correctly track if AUTH has been bundled. · 4007cc88
      Vlad Yasevich 提交于
      We currently track if AUTH has been bundled using the 'auth'
      pointer to the chunk.  However, AUTH is disallowed after DATA
      is already in the packet, so we need to instead use the
      'has_auth' field.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4007cc88
    • W
      sctp: fix to reset packet information after packet transmit · d521c08f
      Wei Yongjun 提交于
      The packet information does not reset after packet transmit, this
      may cause some problems such as following DATA chunk be sent without
      AUTH chunk, even if the authentication of DATA chunk has been
      requested by the peer.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d521c08f
    • V
      sctp: Failover transmitted list on transport delete · 31b02e15
      Vlad Yasevich 提交于
      Add-IP feature allows users to delete an active transport.  If that
      transport has chunks in flight, those chunks need to be moved to another
      transport or association may get into unrecoverable state.
      Reported-by: NRafael Laufer <rlaufer@cisco.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      31b02e15
    • V
      sctp: Fix SCTP_MAXSEG socket option to comply to spec. · f68b2e05
      Vlad Yasevich 提交于
      We had a bug that we never stored the user-defined value for
      MAXSEG when setting the value on an association.  Thus future
      PMTU events ended up re-writing the frag point and increasing
      it past user limit.  Additionally, when setting the option on
      the socket/endpoint, we effect all current associations, which
      is against spec.
      
      Now, we store the user 'maxseg' value along with the computed
      'frag_point'.  We inherit 'maxseg' from the socket at association
      creation and use it as an upper limit for 'frag_point' when its
      set.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      f68b2e05
    • V
      sctp: Don't do NAGLE delay on large writes that were fragmented small · cb95ea32
      Vlad Yasevich 提交于
      SCTP will delay the last part of a large write due to NAGLE, if that
      part is smaller then MTU.  Since we are doing large writes, we might
      as well send the last portion now instead of waiting untill the next
      large write happens.  The small portion will be sent as is regardless,
      so it's better to not delay it.
      
      This is a result of much discussions with Wei Yongjun <yjwei@cn.fujitsu.com>
      and Doug Graham <dgraham@nortel.com>.  Many thanks go out to them.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      cb95ea32
    • V
      sctp: Nagle delay should be based on path mtu · b29e7907
      Vlad Yasevich 提交于
      The decision to delay due to Nagle should be based on the path mtu
      and future packet size.  We currently incorrectly base it on
      'frag_point' which is the SCTP DATA segment size, and also we do
      not count DATA chunk header overhead in the computation.  This
      actuall allows situations where a user can set low 'frag_point',
      and then send small messages without delay.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b29e7907
    • V
      sctp: Try not to change a_rwnd when faking a SACK from SHUTDOWN. · d4d6fb57
      Vlad Yasevich 提交于
      We currently set a_rwnd to 0 when faking a SACK from SHUTDOWN.
      This results in an hung association if the remote only uses
      SHUTDOWNs (which it's allowed to do) to acknowlege DATA when
      closing.  The reason for that is that we simply honor the a_rwnd
      from the sack, but since we faked it to be 0, we enter 0-window
      probing.  The fix is to use the peers old rwnd and add our flight
      size to it.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d4d6fb57
    • V
      sctp: drop a_rwnd to 0 when receive buffer overflows. · 4d3c46e6
      Vlad Yasevich 提交于
      SCTP has a problem that when small chunks are used, it is possible
      to exhaust the receiver buffer without fully closing receive window.
      This happens due to all overhead that we have account for with small
      messages.  To fix this, when receive buffer is exceeded, we'll drop
      the window to 0 and save the 'drop' portion.  When application starts
      reading data and freeing up recevie buffer space, we'll wait until
      we've reached the 'drop' window and then add back this 'drop' one
      mtu at a time.  This worked well in testing and under stress produced
      rather even recovery.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      4d3c46e6
    • V
      sctp: Clear fast_recovery on the transport when T3 timer expires. · 33ce8281
      Vlad Yasevich 提交于
      If T3 timer expires, we are retransmitting data due to timeout any
      any fast recovery is null and void.  We can clear the fast recovery
      flag.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      33ce8281
    • V
      sctp: Fix error count increments that were results of HEARTBEATS · b9f84786
      Vlad Yasevich 提交于
      SCTP RFC 4960 states that unacknowledged HEARTBEATS count as
      errors agains a given transport or endpoint.  As such, we
      should increment the error counts for only for unacknowledged
      HB, otherwise we detect failure too soon.  This goes for both
      the overall error count and the path error count.
      
      Now, there is a difference in how the detection is done
      between the two.  The path error detection is done after
      the increment, so to detect it properly, we actually need
      to exceed the path threshold.  The overall error detection
      is done _BEFORE_ the increment.  Thus to detect the failure,
      it's enough for the error count to match the threshold.
      This is why all the state functions use '>=' to detect failure,
      while path detection uses '>'.
      
      Thanks goes to Chunbo Luo <chunbo.luo@windriver.com> who first
      proposed patches to fix this issue and made me re-read the spec
      and the code to figure out how this cruft really works.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      b9f84786
    • A
      sctp: use proc_create() · d71a09ed
      Alexey Dobriyan 提交于
      create_proc_entry() is deprecated (not formally, though).
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      d71a09ed
    • W
      sctp: fix check the chunk length of received HEARTBEAT-ACK chunk · dadb50cc
      Wei Yongjun 提交于
      The receiver of the HEARTBEAT should respond with a HEARTBEAT ACK
      that contains the Heartbeat Information field copied from the
      received HEARTBEAT chunk. So the received HEARTBEAT-ACK chunk
      must have a length of:
        sizeof(sctp_chunkhdr_t) + sizeof(sctp_sender_hb_info_t)
      
      A badly formatted HB-ACK chunk, it is possible that we may access
      invalid memory.  We should really make sure that the chunk format
      is what we expect, before attempting to touch the data.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      dadb50cc
    • W
      sctp: drop SHUTDOWN chunk if the TSN is less than the CTSN · a2f36eec
      Wei Yongjun 提交于
      If Cumulative TSN Ack field of SHUTDOWN chunk is less than the
      Cumulative TSN Ack Point then drop the SHUTDOWN chunk.
      Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      a2f36eec
    • V
      sctp: Send user messages to the lower layer as one · 9c5c62be
      Vlad Yasevich 提交于
      Currenlty, sctp breaks up user messages into fragments and
      sends each fragment to the lower layer by itself.  This means
      that for each fragment we go all the way down the stack
      and back up.  This also discourages bundling of multiple
      fragments when they can fit into a sigle packet (ex: due
      to user setting a low fragmentation threashold).
      
      We introduce a new command SCTP_CMD_SND_MSG and hand the
      whole message down state machine.  The state machine and
      the side-effect parser will cork the queue, add all chunks
      from the message to the queue, and then un-cork the queue
      thus causing the chunks to get transmitted.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      9c5c62be
    • V
      sctp: Try to encourage SACK bundling with DATA. · 5d7ff261
      Vlad Yasevich 提交于
      If the association has a SACK timer pending and now DATA queued
      to be send, we'll try to bundle the SACK with the next application send.
      As such, try encourage bundling by accounting for SACK in the size
      of the first chunk fragment.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      5d7ff261
    • V
      sctp: Generate SACKs when actually sending outbound DATA · e83963b7
      Vlad Yasevich 提交于
      We are now trying to bundle SACKs when we have outbound
      DATA to send.  However, there are situations where this
      outbound DATA will not be sent (due to congestion or 
      available window).  In such cases it's ok to wait for the
      timer to expire.  This patch refactors the sending code
      so that betfore attempting to bundle the SACK we check
      to see if the DATA will actually be transmitted.
      
      Based on eirlier works for Doug Graham <dgraham@nortel.com> and
      Wei Youngjun <yjwei@cn.fujitsu.com>.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      e83963b7
    • V
      sctp: Fix data segmentation with small frag_size · 3e62abf9
      Vlad Yasevich 提交于
      Since an application may specify the maximum SCTP fragment size
      that all data should be fragmented to, we need to fix how
      we do segmentation.   Right now, if a user specifies a small
      fragment size, the segment size can go negative in the presence
      of AUTH or COOKIE_ECHO bundling.
      
      What we need to do is track the largest possbile DATA chunk that
      can fit into the mtu.  Then if the fragment size specified is
      bigger then this maximum length, we'll shrink it down.  Otherwise,
      we just use the smaller segment size without changing it further.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      3e62abf9
    • V
      sctp: Disallow new connection on a closing socket · bec9640b
      Vlad Yasevich 提交于
      If a socket has a lot of association that are in the process of
      of being closed/aborted, it is possible for a remote to establish
      new associations during the time period that the old ones are shutting
      down.  If this was a result of a close() call, there will be no socket
      and will cause a memory leak.  We'll prevent this by setting the
      socket state to CLOSING and disallow new associations when in this state.
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      bec9640b
    • D
      sctp: Fix piggybacked ACKs · af87b823
      Doug Graham 提交于
      This patch corrects the conditions under which a SACK will be piggybacked
      on a DATA packet.  The previous condition was incorrect due to a
      misinterpretation of RFC 4960 and/or RFC 2960.  Specifically, the
      following paragraph from section 6.2 had not been implemented correctly:
      
         Before an endpoint transmits a DATA chunk, if any received DATA
         chunks have not been acknowledged (e.g., due to delayed ack), the
         sender should create a SACK and bundle it with the outbound DATA
         chunk, as long as the size of the final SCTP packet does not exceed
         the current MTU.  See Section 6.2.
      
      When about to send a DATA chunk, the code now checks to see if the SACK
      timer is running.  If it is, we know we have a SACK to send to the
      peer, so we append the SACK (assuming available space in the packet)
      and turn off the timer.  For a simple request-response scenario, this
      will result in the SACK being bundled with the response, meaning the
      the SACK is received quickly by the client, and also meaning that no
      separate SACK packet needs to be sent by the server to acknowledge the
      request.  Prior to this patch, a separate SACK packet would have been
      sent by the server SCTP only after its delayed-ACK timer had expired
      (usually 200ms).  This is wasteful of bandwidth, and can also have a
      major negative impact on performance due the interaction of delayed ACKs
      with the Nagle algorithm.
      Signed-off-by: NDoug Graham <dgraham@nortel.com>
      Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
      af87b823