1. 16 4月, 2016 1 次提交
    • M
      sctp: simplify sk_receive_queue locking · 311b2177
      Marcelo Ricardo Leitner 提交于
      SCTP already serializes access to rcvbuf through its sock lock:
      sctp_recvmsg takes it right in the start and release at the end, while
      rx path will also take the lock before doing any socket processing. On
      sctp_rcv() it will check if there is an user using the socket and, if
      there is, it will queue incoming packets to the backlog. The backlog
      processing will do the same. Even timers will do such check and
      re-schedule if an user is using the socket.
      
      Simplifying this will allow us to remove sctp_skb_list_tail and get ride
      of some expensive lockings.  The lists that it is used on are also
      mangled with functions like __skb_queue_tail and __skb_unlink in the
      same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
      sctp_close() will also purge those while using only the sock lock.
      
      Therefore the lockings performed by sctp_skb_list_tail() are not
      necessary. This patch removes this function and replaces its calls with
      just skb_queue_splice_tail_init() instead.
      
      The biggest gain is at sctp_ulpq_tail_event(), because the events always
      contain a list, even if it's queueing a single skb and this was
      triggering expensive calls to spin_lock_irqsave/_irqrestore for every
      data chunk received.
      
      As SCTP will deliver each data chunk on a corresponding recvmsg, the
      more effective the change will be.
      Before this patch, with chunks with 30 bytes:
      netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
      400000 -s 400000 400000
      on a 10Gbit link with 1500 MTU:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608
      
      With it:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      311b2177
  2. 15 4月, 2016 1 次提交
  3. 14 4月, 2016 1 次提交
  4. 06 4月, 2016 1 次提交
  5. 05 4月, 2016 1 次提交
  6. 31 3月, 2016 1 次提交
  7. 23 3月, 2016 1 次提交
  8. 21 3月, 2016 3 次提交
    • M
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner 提交于
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3822a5ff
    • M
      sctp: do not leak chunks that are sent to unconfirmed paths · 31b055ef
      Marcelo Ricardo Leitner 提交于
      Currently, if a chunk is scheduled to be sent through a transport that
      is currently unconfirmed, it will be leaked as it is dequeued from outq
      and is not re-queued nor freed.
      
      As I'm not aware of any situation that may lead to this situation, I'm
      fixing this by freeing the chunk and also logging a trace so that we can
      fix the other bug if it ever happens.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31b055ef
    • M
      sctp: do not update a_rwnd if we are not issuing a sack · 07b4d6a1
      Marcelo Ricardo Leitner 提交于
      The SACK can be lost pretty much elsewhere, but if its allocation fail,
      we know we are not sending it, so it is better to revert a_rwnd to its
      previous value as this may give it a chance to issue a window update
      later.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07b4d6a1
  9. 17 3月, 2016 1 次提交
  10. 14 3月, 2016 2 次提交
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
    • X
      sctp: fix the transports round robin issue when init is retransmitted · 39d2adeb
      Xin Long 提交于
      prior to this patch, at the beginning if we have two paths in one assoc,
      they may have the same params other than the last_time_heard, it will try
      the paths like this:
      
      1st cycle
        try trans1 fail.
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      2nd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      3rd cycle:
        try  trans2 fail
        then trans2 is selected.(cause it's last_time_heard is after trans1).
      
      ....
      
      trans1 will never have change to be selected, which is not what we expect.
      we should keeping round robin all the paths if they are just added at the
      beginning.
      
      So at first every tranport's last_time_heard should be initialized 0, so
      that we ensure they have the same value at the beginning, only by this,
      all the transports could get equal chance to be selected.
      
      Then for sctp_trans_elect_best, it should return the trans_next one when
      *trans == *trans_next, so that we can try next if it fails,  but now it
      always return trans. so we can fix it by exchanging these two params when
      we calls sctp_trans_elect_tie().
      
      Fixes: 4c47af4d ('net: sctp: rework multihoming retransmission path selection to rfc4960')
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      39d2adeb
  11. 09 3月, 2016 1 次提交
  12. 02 3月, 2016 3 次提交
  13. 22 2月, 2016 1 次提交
    • N
      sctp: Fix port hash table size computation · d9749fb5
      Neil Horman 提交于
      Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
      its size computation, observing that the current method never guaranteed
      that the hashsize (measured in number of entries) would be a power of two,
      which the input hash function for that table requires.  The root cause of
      the problem is that two values need to be computed (one, the allocation
      order of the storage requries, as passed to __get_free_pages, and two the
      number of entries for the hash table).  Both need to be ^2, but for
      different reasons, and the existing code is simply computing one order
      value, and using it as the basis for both, which is wrong (i.e. it assumes
      that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).
      
      To fix this, we change the logic slightly.  We start by computing a goal
      allocation order (which is limited by the maximum size hash table we want
      to support.  Then we attempt to allocate that size table, decreasing the
      order until a successful allocation is made.  Then, with the resultant
      successful order we compute the number of buckets that hash table supports,
      which we then round down to the nearest power of two, giving us the number
      of entries the table actually supports.
      
      I've tested this locally here, using non-debug and spinlock-debug kernels,
      and the number of entries in the hashtable consistently work out to be
      powers of two in all cases.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      CC: Dmitry Vyukov <dvyukov@google.com>
      CC: Vladislav Yasevich <vyasevich@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9749fb5
  14. 18 2月, 2016 3 次提交
  15. 11 2月, 2016 1 次提交
  16. 09 2月, 2016 1 次提交
  17. 29 1月, 2016 3 次提交
  18. 27 1月, 2016 1 次提交
  19. 25 1月, 2016 1 次提交
  20. 18 1月, 2016 1 次提交
  21. 16 1月, 2016 2 次提交
  22. 12 1月, 2016 1 次提交
  23. 11 1月, 2016 1 次提交
  24. 06 1月, 2016 5 次提交
  25. 31 12月, 2015 1 次提交
    • X
      sctp: sctp should release assoc when sctp_make_abort_user return NULL in sctp_close · 068d8bd3
      Xin Long 提交于
      In sctp_close, sctp_make_abort_user may return NULL because of memory
      allocation failure. If this happens, it will bypass any state change
      and never free the assoc. The assoc has no chance to be freed and it
      will be kept in memory with the state it had even after the socket is
      closed by sctp_close().
      
      So if sctp_make_abort_user fails to allocate memory, we should abort
      the asoc via sctp_primitive_ABORT as well. Just like the annotation in
      sctp_sf_cookie_wait_prm_abort and sctp_sf_do_9_1_prm_abort said,
      "Even if we can't send the ABORT due to low memory delete the TCB.
      This is a departure from our typical NOMEM handling".
      
      But then the chunk is NULL (low memory) and the SCTP_CMD_REPLY cmd would
      dereference the chunk pointer, and system crash. So we should add
      SCTP_CMD_REPLY cmd only when the chunk is not NULL, just like other
      places where it adds SCTP_CMD_REPLY cmd.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      068d8bd3
  26. 28 12月, 2015 1 次提交