1. 22 3月, 2017 1 次提交
  2. 10 3月, 2017 1 次提交
    • D
      net: Work around lockdep limitation in sockets that use sockets · cdfbabfb
      David Howells 提交于
      Lockdep issues a circular dependency warning when AFS issues an operation
      through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.
      
      The theory lockdep comes up with is as follows:
      
       (1) If the pagefault handler decides it needs to read pages from AFS, it
           calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
           creating a call requires the socket lock:
      
      	mmap_sem must be taken before sk_lock-AF_RXRPC
      
       (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
           binds the underlying UDP socket whilst holding its socket lock.
           inet_bind() takes its own socket lock:
      
      	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET
      
       (3) Reading from a TCP socket into a userspace buffer might cause a fault
           and thus cause the kernel to take the mmap_sem, but the TCP socket is
           locked whilst doing this:
      
      	sk_lock-AF_INET must be taken before mmap_sem
      
      However, lockdep's theory is wrong in this instance because it deals only
      with lock classes and not individual locks.  The AF_INET lock in (2) isn't
      really equivalent to the AF_INET lock in (3) as the former deals with a
      socket entirely internal to the kernel that never sees userspace.  This is
      a limitation in the design of lockdep.
      
      Fix the general case by:
      
       (1) Double up all the locking keys used in sockets so that one set are
           used if the socket is created by userspace and the other set is used
           if the socket is created by the kernel.
      
       (2) Store the kern parameter passed to sk_alloc() in a variable in the
           sock struct (sk_kern_sock).  This informs sock_lock_init(),
           sock_init_data() and sk_clone_lock() as to the lock keys to be used.
      
           Note that the child created by sk_clone_lock() inherits the parent's
           kern setting.
      
       (3) Add a 'kern' parameter to ->accept() that is analogous to the one
           passed in to ->create() that distinguishes whether kernel_accept() or
           sys_accept4() was the caller and can be passed to sk_alloc().
      
           Note that a lot of accept functions merely dequeue an already
           allocated socket.  I haven't touched these as the new socket already
           exists before we get the parameter.
      
           Note also that there are a couple of places where I've made the accepted
           socket unconditionally kernel-based:
      
      	irda_accept()
      	rds_rcp_accept_one()
      	tcp_accept_from_sock()
      
           because they follow a sock_create_kern() and accept off of that.
      
      Whilst creating this, I noticed that lustre and ocfs don't create sockets
      through sock_create_kern() and thus they aren't marked as for-kernel,
      though they appear to be internal.  I wonder if these should do that so
      that they use the new set of lock keys.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cdfbabfb
  3. 20 2月, 2017 1 次提交
    • X
      sctp: add support for MSG_MORE · 4ea0c32f
      Xin Long 提交于
      This patch is to add support for MSG_MORE on sctp.
      
      It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
      creating datamsg according to the send flag. sctp_packet_can_append_data
      then uses it to decide if the chunks of this msg will be sent at once or
      delay it.
      
      Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
      in assoc. As sctp enqueues the chunks first, then dequeue them one by
      one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
      affect other chunks' bundling.
      
      Since last patch, sctp flush out queue once assoc state falls into
      SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
      solved as well.
      
      [1] https://patchwork.ozlabs.org/patch/372404/Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea0c32f
  4. 08 2月, 2017 2 次提交
  5. 19 1月, 2017 5 次提交
    • X
      sctp: implement sender-side procedures for SSN Reset Request Parameter · 7f9d68ac
      Xin Long 提交于
      This patch is to implement sender-side procedures for the Outgoing
      and Incoming SSN Reset Request Parameter described in rfc6525 section
      5.1.2 and 5.1.3.
      
      It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2
      for users.
      
      Note that the new asoc member strreset_outstanding is to make sure
      only one reconf request chunk on the fly as rfc6525 section 5.1.1
      demands.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f9d68ac
    • X
      sctp: add sockopt SCTP_ENABLE_STREAM_RESET · 9fb657ae
      Xin Long 提交于
      This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set
      strreset_enable to indicate which reconf request type it supports,
      which is described in rfc6525 section 6.3.1.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fb657ae
    • X
      sctp: add reconf_enable in asoc ep and netns · c28445c3
      Xin Long 提交于
      This patch is to add reconf_enable field in all of asoc ep and netns
      to indicate if they support stream reset.
      
      When initializing, asoc reconf_enable get the default value from ep
      reconf_enable which is from netns netns reconf_enable by default.
      
      It is also to add reconf_capable in asoc peer part to know if peer
      supports reconf_enable, the value is set if ext params have reconf
      chunk support when processing init chunk, just as rfc6525 section
      5.1.1 demands.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c28445c3
    • X
      sctp: add stream reconf timer · 7b9438de
      Xin Long 提交于
      This patch is to add a per transport timer based on sctp timer frame
      for stream reconf chunk retransmission. It would start after sending
      a reconf request chunk, and stop after receiving the response chunk.
      
      If the timer expires, besides retransmitting the reconf request chunk,
      it would also do the same thing with data RTO timer. like to increase
      the appropriate error counts, and perform threshold management, possibly
      destroying the asoc if sctp retransmission thresholds are exceeded, just
      as section 5.1.1 describes.
      
      This patch is also to add asoc strreset_chunk, it is used to save the
      reconf request chunk, so that it can be retransmitted, and to check if
      the response is really for this request by comparing the information
      inside with the response chunk as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b9438de
    • X
      sctp: add support for generating stream reconf ssn reset request chunk · cc16f00f
      Xin Long 提交于
      This patch is to add asoc strreset_outseq and strreset_inseq for
      saving the reconf request sequence, initialize them when create
      assoc and process init, and also to define Incoming and Outgoing
      SSN Reset Request Parameter described in rfc6525 section 4.1 and
      4.2, As they can be in one same chunk as section rfc6525 3.1-3
      describes, it makes them in one function.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc16f00f
  6. 07 1月, 2017 1 次提交
    • X
      sctp: prepare asoc stream for stream reconf · a8386317
      Xin Long 提交于
      sctp stream reconf, described in RFC 6525, needs a structure to
      save per stream information in assoc, like stream state.
      
      In the future, sctp stream scheduler also needs it to save some
      stream scheduler params and queues.
      
      This patchset is to prepare the stream array in assoc for stream
      reconf. It defines sctp_stream that includes stream arrays inside
      to replace ssnmap.
      
      Note that we use different structures for IN and OUT streams, as
      the members in per OUT stream will get more and more different
      from per IN stream.
      
      v1->v2:
        - put these patches into a smaller group.
      v2->v3:
        - define sctp_stream to contain stream arrays, and create stream.c
          to put stream-related functions.
        - merge 3 patches into 1, as new sctp_stream has the same name
          with before.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8386317
  7. 29 12月, 2016 1 次提交
  8. 17 11月, 2016 1 次提交
    • X
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long 提交于
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fda702f
  9. 13 10月, 2016 2 次提交
  10. 30 9月, 2016 2 次提交
    • X
      sctp: remove prsctp_param from sctp_chunk · 0605483f
      Xin Long 提交于
      Now sctp uses chunk->prsctp_param to save the prsctp param for all the
      prsctp polices, we didn't need to introduce prsctp_param to sctp_chunk.
      We can just use chunk->sinfo.sinfo_timetolive for RTX and BUF polices,
      and reuse msg->expires_at for TTL policy, as the prsctp polices and old
      expires policy are mutual exclusive.
      
      This patch is to remove prsctp_param from sctp_chunk, and reuse msg's
      expires_at for TTL and chunk's sinfo.sinfo_timetolive for RTX and BUF
      polices.
      
      Note that sctp can't use chunk's sinfo.sinfo_timetolive for TTL policy,
      as it needs a u64 variables to save the expires_at time.
      
      This one also fixes the "netperf-Throughput_Mbps -37.2% regression"
      issue.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0605483f
    • X
      sctp: move sent_count to the memory hole in sctp_chunk · 73dca124
      Xin Long 提交于
      Now pahole sctp_chunk, it has 2 memory holes:
         struct sctp_chunk {
      	struct list_head           list;
      	atomic_t                   refcnt;
      	/* XXX 4 bytes hole, try to pack */
      	...
      	long unsigned int          prsctp_param;
      	int                        sent_count;
      	/* XXX 4 bytes hole, try to pack */
      
      This patch is to move up sent_count to fill the 1st one and eliminate
      the 2nd one.
      
      It's not just another struct compaction, it also fixes the "netperf-
      Throughput_Mbps -37.2% regression" issue when overloading the CPU.
      
      Fixes: a6c2f792 ("sctp: implement prsctp TTL policy")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73dca124
  11. 19 9月, 2016 2 次提交
  12. 14 7月, 2016 3 次提交
    • M
      sctp: avoid identifying address family many times for a chunk · e7487c86
      Marcelo Ricardo Leitner 提交于
      Identifying address family operations during rx path is not something
      expensive but it's ugly to the eye to have it done multiple times,
      specially when we already validated it during initial rx processing.
      
      This patch takes advantage of the now shared sctp_input_cb and make the
      pointer to the operations readily available.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7487c86
    • M
      sctp: allow GSO frags to access the chunk too · 1f45f78f
      Marcelo Ricardo Leitner 提交于
      SCTP will try to access original IP headers on sctp_recvmsg in order to
      copy the addresses used. There are also other places that do similar access
      to IP or even SCTP headers. But after 90017acc ("sctp: Add GSO
      support") they aren't always there because they are only present in the
      header skb.
      
      SCTP handles the queueing of incoming data by cloning the incoming skb
      and limiting to only the relevant payload. This clone has its cb updated
      to something different and it's then queued on socket rx queue. Thus we
      need to fix this in two moments.
      
      For rx path, not related to socket queue yet, this patch uses a
      partially copied sctp_input_cb to such GSO frags. This restores the
      ability to access the headers for this part of the code.
      
      Regarding the socket rx queue, it removes iif member from sctp_event and
      also add a chunk pointer on it.
      
      With these changes we're always able to reach the headers again.
      
      The biggest change here is that now the sctp_chunk struct and the
      original skb are only freed after the application consumed the buffer.
      Note however that the original payload was already like this due to the
      skb cloning.
      
      For iif, SCTP's IPv4 code doesn't use it, so no change is necessary.
      IPv6 now can fetch it directly from original's IPv6 CB as the original
      skb is still accessible.
      
      In the future we probably can simplify sctp_v*_skb_iif() stuff, as
      sctp_v4_skb_iif() was called but it's return value not used, and now
      it's not even called, but such cleanup is out of scope for this change.
      
      Fixes: 90017acc ("sctp: Add GSO support")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f45f78f
    • M
      sctp: allow others to use sctp_input_cb · 9e238323
      Marcelo Ricardo Leitner 提交于
      We process input path in other files too and having access to it is
      nice, so move it to a header where it's shared.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e238323
  13. 12 7月, 2016 4 次提交
    • X
      sctp: implement prsctp PRIO policy · 8dbdf1f5
      Xin Long 提交于
      prsctp PRIO policy is a policy to abandon lower priority chunks when
      asoc doesn't have enough snd buffer, so that the current chunk with
      higher priority can be queued successfully.
      
      Similar to TTL/RTX policy, we will set the priority of the chunk to
      prsctp_param with sinfo->sinfo_timetolive in sctp_set_prsctp_policy().
      So if PRIO policy is enabled, msg->expire_at won't work.
      
      asoc->sent_cnt_removable will record how many chunks can be checked to
      remove. If priority policy is enabled, when the chunk is queued into
      the out_queue, we will increase sent_cnt_removable. When the chunk is
      moved to abandon_queue or dequeue and free, we will decrease
      sent_cnt_removable.
      
      In sctp_sendmsg, we will check if there is enough snd buffer for current
      msg and if sent_cnt_removable is not 0. Then try to abandon chunks in
      sctp_prune_prsctp when sendmsg from the retransmit/transmited queue, and
      free chunks from out_queue in right order until the abandon+free size >
      msg_len - sctp_wfree. For the abandon size, we have to wait until it
      sends FORWARD TSN, receives the sack and the chunks are really freed.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8dbdf1f5
    • X
      sctp: implement prsctp TTL policy · a6c2f792
      Xin Long 提交于
      prsctp TTL policy is a policy to abandon chunks when they expire
      at the specific time in local stack. It's similar with expires_at
      in struct sctp_datamsg.
      
      This patch uses sinfo->sinfo_timetolive to set the specific time for
      TTL policy. sinfo->sinfo_timetolive is also used for msg->expires_at.
      So if prsctp_enable or TTL policy is not enabled, msg->expires_at
      still works as before.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6c2f792
    • X
      sctp: add SCTP_PR_ASSOC_STATUS on sctp sockopt · 826d253d
      Xin Long 提交于
      This patch adds SCTP_PR_ASSOC_STATUS to sctp sockopt, which is used
      to dump the prsctp statistics info from the asoc. The prsctp statistics
      includes abandoned_sent/unsent from the asoc. abandoned_sent is the
      count of the packets we drop packets from retransmit/transmited queue,
      and abandoned_unsent is the count of the packets we drop from out_queue
      according to the policy.
      
      Note: another option for prsctp statistics dump described in rfc is
      SCTP_PR_STREAM_STATUS, which is used to dump the prsctp statistics
      info from each stream. But by now, linux doesn't yet have per stream
      statistics info, it needs rfc6525 to be implemented. As the prsctp
      statistics for each stream has to be based on per stream statistics,
      we will delay it until rfc6525 is done in linux.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      826d253d
    • X
      sctp: add SCTP_PR_SUPPORTED on sctp sockopt · 28aa4c26
      Xin Long 提交于
      According to section 4.5 of rfc7496, prsctp_enable should be per asoc.
      We will add prsctp_enable to both asoc and ep, and replace the places
      where it used net.sctp->prsctp_enable with asoc->prsctp_enable.
      
      ep->prsctp_enable will be initialized with net.sctp->prsctp_enable, and
      asoc->prsctp_enable will be initialized with ep->prsctp_enable. We can
      also modify it's value through sockopt SCTP_PR_SUPPORTED.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28aa4c26
  14. 04 6月, 2016 1 次提交
    • M
      sctp: Add GSO support · 90017acc
      Marcelo Ricardo Leitner 提交于
      SCTP has this pecualiarity that its packets cannot be just segmented to
      (P)MTU. Its chunks must be contained in IP segments, padding respected.
      So we can't just generate a big skb, set gso_size to the fragmentation
      point and deliver it to IP layer.
      
      This patch takes a different approach. SCTP will now build a skb as it
      would be if it was received using GRO. That is, there will be a cover
      skb with protocol headers and children ones containing the actual
      segments, already segmented to a way that respects SCTP RFCs.
      
      With that, we can tell skb_segment() to just split based on frag_list,
      trusting its sizes are already in accordance.
      
      This way SCTP can benefit from GSO and instead of passing several
      packets through the stack, it can pass a single large packet.
      
      v2:
      - Added support for receiving GSO frames, as requested by Dave Miller.
      - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
      - Added heuristics similar to what we have in TCP for not generating
        single GSO packets that fills cwnd.
      v3:
      - consider sctphdr size in skb_gso_transport_seglen()
      - rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
        unsupported GSO")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90017acc
  15. 02 5月, 2016 1 次提交
  16. 14 4月, 2016 2 次提交
  17. 11 4月, 2016 1 次提交
    • M
      sctp: avoid refreshing heartbeat timer too often · ba6f5e33
      Marcelo Ricardo Leitner 提交于
      Currently on high rate SCTP streams the heartbeat timer refresh can
      consume quite a lot of resources as timer updates are costly and it
      contains a random factor, which a) is also costly and b) invalidates
      mod_timer() optimization for not editing a timer to the same value.
      It may even cause the timer to be slightly advanced, for no good reason.
      
      As suggested by David Laight this patch now removes this timer update
      from hot path by leaving the timer on and re-evaluating upon its
      expiration if the heartbeat is still needed or not, similarly to what is
      done for TCP. If it's not needed anymore the timer is re-scheduled to
      the new timeout, considering the time already elapsed.
      
      For this, we now record the last tx timestamp per transport, updated in
      the same spots as hb timer was restarted on tx. Also split up
      sctp_transport_reset_timers into sctp_transport_reset_t3_rtx and
      sctp_transport_reset_hb_timer, so we can re-arm T3 without re-arming the
      heartbeat one.
      
      On loopback with MTU of 65535 and data chunks with 1636, so that we
      have a considerable amount of chunks without stressing system calls,
      netperf -t SCTP_STREAM -l 30, perf looked like this before:
      
      Samples: 103K of event 'cpu-clock', Event count (approx.): 25833000000
        Overhead  Command  Shared Object      Symbol
      +    6,15%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,43%  netperf  [kernel.vmlinux]   [k] _raw_write_unlock_irqrestore
         - _raw_write_unlock_irqrestore
            - 96,54% _raw_spin_unlock_irqrestore
               - 36,14% mod_timer
                  + 97,24% sctp_transport_reset_timers
                  + 2,76% sctp_do_sm
               + 33,65% __wake_up_sync_key
               + 28,77% sctp_ulpq_tail_event
               + 1,40% del_timer
            - 1,84% mod_timer
               + 99,03% sctp_transport_reset_timers
               + 0,97% sctp_do_sm
            + 1,50% sctp_ulpq_tail_event
      
      And after this patch, now with netperf -l 60:
      
      Samples: 230K of event 'cpu-clock', Event count (approx.): 57707250000
        Overhead  Command  Shared Object      Symbol
      +    5,65%  netperf  [kernel.vmlinux]   [k] memcpy_erms
      +    5,59%  netperf  [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
      -    5,05%  netperf  [kernel.vmlinux]   [k] _raw_spin_unlock_irqrestore
         - _raw_spin_unlock_irqrestore
            + 49,89% __wake_up_sync_key
            + 45,68% sctp_ulpq_tail_event
            - 2,85% mod_timer
               + 76,51% sctp_transport_reset_t3_rtx
               + 23,49% sctp_do_sm
            + 1,55% del_timer
      +    2,50%  netperf  [sctp]             [k] sctp_datamsg_from_user
      +    2,26%  netperf  [sctp]             [k] sctp_sendmsg
      
      Throughput-wise, from 6800mbps without the patch to 7050mbps with it,
      ~3.7%.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba6f5e33
  18. 14 3月, 2016 1 次提交
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
  19. 09 3月, 2016 1 次提交
  20. 18 2月, 2016 1 次提交
  21. 29 1月, 2016 2 次提交
  22. 27 1月, 2016 1 次提交
  23. 06 1月, 2016 2 次提交
  24. 07 12月, 2015 1 次提交
    • L
      sctp: start t5 timer only when peer rwnd is 0 and local state is SHUTDOWN_PENDING · 8a0d19c5
      lucien 提交于
      when A sends a data to B, then A close() and enter into SHUTDOWN_PENDING
      state, if B neither claim his rwnd is 0 nor send SACK for this data, A
      will keep retransmitting this data until t5 timeout, Max.Retrans times
      can't work anymore, which is bad.
      
      if B's rwnd is not 0, it should send abort after Max.Retrans times, only
      when B's rwnd == 0 and A's retransmitting beyonds Max.Retrans times, A
      will start t5 timer, which is also commit f8d96052 ("sctp: Enforce
      retransmission limit during shutdown") means, but it lacks the condition
      peer rwnd == 0.
      
      so fix it by adding a bit (zero_window_announced) in peer to record if
      the last rwnd is 0. If it was, zero_window_announced will be set. and use
      this bit to decide if start t5 timer when local.state is SHUTDOWN_PENDING.
      
      Fixes: commit f8d96052 ("sctp: Enforce retransmission limit during shutdown")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a0d19c5