1. 27 12月, 2019 1 次提交
    • X
      sctp: cache netns in sctp_ep_common · d98f8758
      Xin Long 提交于
      [ Upstream commit 312434617cb16be5166316cf9d08ba760b1042a1 ]
      
      This patch is to fix a data-race reported by syzbot:
      
        BUG: KCSAN: data-race in sctp_assoc_migrate / sctp_hash_obj
      
        write to 0xffff8880b67c0020 of 8 bytes by task 18908 on cpu 1:
          sctp_assoc_migrate+0x1a6/0x290 net/sctp/associola.c:1091
          sctp_sock_migrate+0x8aa/0x9b0 net/sctp/socket.c:9465
          sctp_accept+0x3c8/0x470 net/sctp/socket.c:4916
          inet_accept+0x7f/0x360 net/ipv4/af_inet.c:734
          __sys_accept4+0x224/0x430 net/socket.c:1754
          __do_sys_accept net/socket.c:1795 [inline]
          __se_sys_accept net/socket.c:1792 [inline]
          __x64_sys_accept+0x4e/0x60 net/socket.c:1792
          do_syscall_64+0xcc/0x370 arch/x86/entry/common.c:290
          entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
        read to 0xffff8880b67c0020 of 8 bytes by task 12003 on cpu 0:
          sctp_hash_obj+0x4f/0x2d0 net/sctp/input.c:894
          rht_key_get_hash include/linux/rhashtable.h:133 [inline]
          rht_key_hashfn include/linux/rhashtable.h:159 [inline]
          rht_head_hashfn include/linux/rhashtable.h:174 [inline]
          head_hashfn lib/rhashtable.c:41 [inline]
          rhashtable_rehash_one lib/rhashtable.c:245 [inline]
          rhashtable_rehash_chain lib/rhashtable.c:276 [inline]
          rhashtable_rehash_table lib/rhashtable.c:316 [inline]
          rht_deferred_worker+0x468/0xab0 lib/rhashtable.c:420
          process_one_work+0x3d4/0x890 kernel/workqueue.c:2269
          worker_thread+0xa0/0x800 kernel/workqueue.c:2415
          kthread+0x1d4/0x200 drivers/block/aoe/aoecmd.c:1253
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:352
      
      It was caused by rhashtable access asoc->base.sk when sctp_assoc_migrate
      is changing its value. However, what rhashtable wants is netns from asoc
      base.sk, and for an asoc, its netns won't change once set. So we can
      simply fix it by caching netns since created.
      
      Fixes: d6c0256a ("sctp: add the rhashtable apis for sctp global transport hashtable")
      Reported-by: syzbot+e3b35fe7918ff0ee474e@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d98f8758
  2. 16 10月, 2018 1 次提交
    • X
      sctp: use the pmtu from the icmp packet to update transport pathmtu · d805397c
      Xin Long 提交于
      Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
      is also processing transport pmtu_pending by icmp packets. But it's
      meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.
      
      The right pmtu value should come from the icmp packet, and it would
      be saved into transport->mtu_info in this patch and used later when
      the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.
      
      Besides, without this patch, as pmtu can only be updated correctly
      when receiving a icmp packet and no place is holding sock lock, it
      will take long time if the sock is busy with sending packets.
      
      Note that it doesn't process transport->mtu_info in .release_cb(),
      as there is no enough information for pmtu update, like for which
      asoc or transport. It is not worth traversing all asocs to check
      pmtu_pending. So unlike tcp, sctp does this in tx path, for which
      mtu_info needs to be atomic_t.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d805397c
  3. 22 6月, 2018 1 次提交
    • N
      rhashtable: split rhashtable.h · 0eb71a9d
      NeilBrown 提交于
      Due to the use of rhashtables in net namespaces,
      rhashtable.h is included in lots of the kernel,
      so a small changes can required a large recompilation.
      This makes development painful.
      
      This patch splits out rhashtable-types.h which just includes
      the major type declarations, and does not include (non-trivial)
      inline code.  rhashtable.h is no longer included by anything
      in the include/ directory.
      Common include files only include rhashtable-types.h so a large
      recompilation is only triggered when that changes.
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb71a9d
  4. 27 3月, 2018 1 次提交
  5. 10 3月, 2018 1 次提交
  6. 13 2月, 2018 1 次提交
  7. 09 1月, 2018 2 次提交
    • M
      sctp: fix the handling of ICMP Frag Needed for too small MTUs · b6c5734d
      Marcelo Ricardo Leitner 提交于
      syzbot reported a hang involving SCTP, on which it kept flooding dmesg
      with the message:
      [  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
      low, using default minimum of 512
      
      That happened because whenever SCTP hits an ICMP Frag Needed, it tries
      to adjust to the new MTU and triggers an immediate retransmission. But
      it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
      allowed (512) would not cause the PMTU to change, and issued the
      retransmission anyway (thus leading to another ICMP Frag Needed, and so
      on).
      
      As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
      are higher than that, sctp_transport_update_pmtu() is changed to
      re-fetch the PMTU that got set after our request, and with that, detect
      if there was an actual change or not.
      
      The fix, thus, skips the immediate retransmission if the received ICMP
      resulted in no change, in the hope that SCTP will select another path.
      
      Note: The value being used for the minimum MTU (512,
      SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
      SCTP_MIN_PMTU), but such change belongs to another patch.
      
      Changes from v1:
      - do not disable PMTU discovery, in the light of commit
      06ad3919 ("[SCTP] Don't disable PMTU discovery when mtu is small")
      and as suggested by Xin Long.
      - changed the way to break the rtx loop by detecting if the icmp
        resulted in a change or not
      Changes from v2:
      none
      
      See-also: https://lkml.org/lkml/2017/12/22/811Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b6c5734d
    • M
      sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled · cc35c3d1
      Marcelo Ricardo Leitner 提交于
      Currently, if PMTU discovery is disabled on a given transport, but the
      configured value is higher than the actual PMTU, it is likely that we
      will get some icmp Frag Needed. The issue is, if PMTU discovery is
      disabled, we won't update the information and will issue a
      retransmission immediately, which may very well trigger another ICMP,
      and another retransmission, leading to a loop.
      
      The fix is to simply not trigger immediate retransmissions if PMTU
      discovery is disabled on the given transport.
      
      Changes from v2:
      - updated stale comment, noticed by Xin Long
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc35c3d1
  8. 29 10月, 2017 1 次提交
  9. 20 10月, 2017 1 次提交
  10. 04 8月, 2017 1 次提交
  11. 02 7月, 2017 2 次提交
  12. 27 5月, 2017 1 次提交
  13. 05 4月, 2017 1 次提交
  14. 02 3月, 2017 1 次提交
  15. 20 2月, 2017 1 次提交
    • X
      sctp: check duplicate node before inserting a new transport · cd2b7087
      Xin Long 提交于
      sctp has changed to use rhlist for transport rhashtable since commit
      7fda702f ("sctp: use new rhlist interface on sctp transport
      rhashtable").
      
      But rhltable_insert_key doesn't check the duplicate node when inserting
      a node, unlike rhashtable_lookup_insert_key. It may cause duplicate
      assoc/transport in rhashtable. like:
      
       client (addr A, B)                 server (addr X, Y)
          connect to X           INIT (1)
                              ------------>
          connect to Y           INIT (2)
                              ------------>
                               INIT_ACK (1)
                              <------------
                               INIT_ACK (2)
                              <------------
      
      After sending INIT (2), one transport will be created and hashed into
      rhashtable. But when receiving INIT_ACK (1) and processing the address
      params, another transport will be created and hashed into rhashtable
      with the same addr Y and EP as the last transport. This will confuse
      the assoc/transport's lookup.
      
      This patch is to fix it by returning err if any duplicate node exists
      before inserting it.
      
      Fixes: 7fda702f ("sctp: use new rhlist interface on sctp transport rhashtable")
      Reported-by: NFabio M. Di Nitto <fdinitto@redhat.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd2b7087
  16. 29 12月, 2016 1 次提交
  17. 17 11月, 2016 1 次提交
    • X
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long 提交于
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fda702f
  18. 01 11月, 2016 2 次提交
    • X
      sctp: hold transport instead of assoc when lookup assoc in rx path · dae399d7
      Xin Long 提交于
      Prior to this patch, in rx path, before calling lock_sock, it needed to
      hold assoc when got it by __sctp_lookup_association, in case other place
      would free/put assoc.
      
      But in __sctp_lookup_association, it lookup and hold transport, then got
      assoc by transport->assoc, then hold assoc and put transport. It means
      it didn't hold transport, yet it was returned and later on directly
      assigned to chunk->transport.
      
      Without the protection of sock lock, the transport may be freed/put by
      other places, which would cause a use-after-free issue.
      
      This patch is to fix this issue by holding transport instead of assoc.
      As holding transport can make sure to access assoc is also safe, and
      actually it looks up assoc by searching transport rhashtable, to hold
      transport here makes more sense.
      
      Note that the function will be renamed later on on another patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dae399d7
    • X
      sctp: return back transport in __sctp_rcv_init_lookup · 7c17fcc7
      Xin Long 提交于
      Prior to this patch, it used a local variable to save the transport that is
      looked up by __sctp_lookup_association(), and didn't return it back. But in
      sctp_rcv, it is used to initialize chunk->transport. So when hitting this,
      even if it found the transport, it was still initializing chunk->transport
      with null instead.
      
      This patch is to return the transport back through transport pointer
      that is from __sctp_rcv_lookup_harder().
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c17fcc7
  19. 22 9月, 2016 1 次提交
  20. 13 9月, 2016 1 次提交
    • X
      sctp: hold the transport before using it in sctp_hash_cmp · 715f5552
      Xin Long 提交于
      Since commit 4f008781 ("sctp: apply rhashtable api to send/recv
      path"), sctp uses transport rhashtable with .obj_cmpfn sctp_hash_cmp,
      in which it compares the members of the transport with the rhashtable
      args to check if it's the right transport.
      
      But sctp uses the transport without holding it in sctp_hash_cmp, it can
      cause a use-after-free panic. As after it gets transport from hashtable,
      another CPU may close the sk and free the asoc. In sctp_association_free,
      it frees all the transports, meanwhile, the assoc's refcnt may be reduced
      to 0, assoc can be destroyed by sctp_association_destroy.
      
      So after that, transport->assoc is actually an unavailable memory address
      in sctp_hash_cmp. Although sctp_hash_cmp is under rcu_read_lock, it still
      can not avoid this, as assoc is not freed by RCU.
      
      This patch is to hold the transport before checking it's members with
      sctp_transport_hold, in which it checks the refcnt first, holds it if
      it's not 0.
      
      Fixes: 4f008781 ("sctp: apply rhashtable api to send/recv path")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      715f5552
  21. 20 8月, 2016 1 次提交
    • M
      sctp: linearize early if it's not GSO · 4c2f2454
      Marcelo Ricardo Leitner 提交于
      Because otherwise when crc computation is still needed it's way more
      expensive than on a linear buffer to the point that it affects
      performance.
      
      It's so expensive that netperf test gives a perf output as below:
      
      Overhead  Command         Shared Object       Symbol
        18,62%  netserver       [kernel.vmlinux]    [k] crc32_generic_shift
         2,57%  netserver       [kernel.vmlinux]    [k] __pskb_pull_tail
         1,94%  netserver       [kernel.vmlinux]    [k] fib_table_lookup
         1,90%  netserver       [kernel.vmlinux]    [k] copy_user_enhanced_fast_string
         1,66%  swapper         [kernel.vmlinux]    [k] intel_idle
         1,63%  netserver       [kernel.vmlinux]    [k] _raw_spin_lock
         1,59%  netserver       [sctp]              [k] sctp_packet_transmit
         1,55%  netserver       [kernel.vmlinux]    [k] memcpy_erms
         1,42%  netserver       [sctp]              [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3016.42   2.88     3.78     1.874   2.462
      
      After patch:
      Overhead  Command         Shared Object      Symbol
         2,75%  netserver       [kernel.vmlinux]   [k] memcpy_erms
         2,63%  netserver       [kernel.vmlinux]   [k] copy_user_enhanced_fast_string
         2,39%  netserver       [kernel.vmlinux]   [k] fib_table_lookup
         2,04%  netserver       [kernel.vmlinux]   [k] __pskb_pull_tail
         1,91%  netserver       [kernel.vmlinux]   [k] _raw_spin_lock
         1,91%  netserver       [sctp]             [k] sctp_packet_transmit
         1,72%  netserver       [mlx4_en]          [k] mlx4_en_process_rx_cq
         1,68%  netserver       [sctp]             [k] sctp_rcv
      
      # netperf -H 192.168.10.1 -l 10 -t SCTP_STREAM -cC -- -m 12000
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.10.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      212992 212992  12000    10.00      3681.77   3.83     3.46     2.045   1.849
      
      Fixes: 3acb50c1 ("sctp: delay as much as possible skb_linearize")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4c2f2454
  22. 26 7月, 2016 1 次提交
    • M
      sctp: fix BH handling on socket backlog · eefc1b1d
      Marcelo Ricardo Leitner 提交于
      Now that the backlog processing is called with BH enabled, we have to
      disable BH before taking the socket lock via bh_lock_sock() otherwise
      it may dead lock:
      
      sctp_backlog_rcv()
                      bh_lock_sock(sk);
      
                      if (sock_owned_by_user(sk)) {
                              if (sk_add_backlog(sk, skb, sk->sk_rcvbuf))
                                      sctp_chunk_free(chunk);
                              else
                                      backloged = 1;
                      } else
                              sctp_inq_push(inqueue, chunk);
      
                      bh_unlock_sock(sk);
      
      while sctp_inq_push() was disabling/enabling BH, but enabling BH
      triggers pending softirq, which then may try to re-lock the socket in
      sctp_rcv().
      
      [  219.187215]  <IRQ>
      [  219.187217]  [<ffffffff817ca3e0>] _raw_spin_lock+0x20/0x30
      [  219.187223]  [<ffffffffa041888c>] sctp_rcv+0x48c/0xba0 [sctp]
      [  219.187225]  [<ffffffff816e7db2>] ? nf_iterate+0x62/0x80
      [  219.187226]  [<ffffffff816f1b14>] ip_local_deliver_finish+0x94/0x1e0
      [  219.187228]  [<ffffffff816f1e1f>] ip_local_deliver+0x6f/0xf0
      [  219.187229]  [<ffffffff816f1a80>] ? ip_rcv_finish+0x3b0/0x3b0
      [  219.187230]  [<ffffffff816f17a8>] ip_rcv_finish+0xd8/0x3b0
      [  219.187232]  [<ffffffff816f2122>] ip_rcv+0x282/0x3a0
      [  219.187233]  [<ffffffff810d8bb6>] ? update_curr+0x66/0x180
      [  219.187235]  [<ffffffff816abac4>] __netif_receive_skb_core+0x524/0xa90
      [  219.187236]  [<ffffffff810d8e00>] ? update_cfs_shares+0x30/0xf0
      [  219.187237]  [<ffffffff810d557c>] ? __enqueue_entity+0x6c/0x70
      [  219.187239]  [<ffffffff810dc454>] ? enqueue_entity+0x204/0xdf0
      [  219.187240]  [<ffffffff816ac048>] __netif_receive_skb+0x18/0x60
      [  219.187242]  [<ffffffff816ad1ce>] process_backlog+0x9e/0x140
      [  219.187243]  [<ffffffff816ac8ec>] net_rx_action+0x22c/0x370
      [  219.187245]  [<ffffffff817cd352>] __do_softirq+0x112/0x2e7
      [  219.187247]  [<ffffffff817cc3bc>] do_softirq_own_stack+0x1c/0x30
      [  219.187247]  <EOI>
      [  219.187248]  [<ffffffff810aa1c8>] do_softirq.part.14+0x38/0x40
      [  219.187249]  [<ffffffff810aa24d>] __local_bh_enable_ip+0x7d/0x80
      [  219.187254]  [<ffffffffa0408428>] sctp_inq_push+0x68/0x80 [sctp]
      [  219.187258]  [<ffffffffa04190f1>] sctp_backlog_rcv+0x151/0x1c0 [sctp]
      [  219.187260]  [<ffffffff81692b07>] __release_sock+0x87/0xf0
      [  219.187261]  [<ffffffff81692ba0>] release_sock+0x30/0xa0
      [  219.187265]  [<ffffffffa040e46d>] sctp_accept+0x17d/0x210 [sctp]
      [  219.187266]  [<ffffffff810e7510>] ? prepare_to_wait_event+0xf0/0xf0
      [  219.187268]  [<ffffffff8172d52c>] inet_accept+0x3c/0x130
      [  219.187269]  [<ffffffff8168d7a3>] SYSC_accept4+0x103/0x210
      [  219.187271]  [<ffffffff817ca2ba>] ? _raw_spin_unlock_bh+0x1a/0x20
      [  219.187272]  [<ffffffff81692bfc>] ? release_sock+0x8c/0xa0
      [  219.187276]  [<ffffffffa0413e22>] ? sctp_inet_listen+0x62/0x1b0 [sctp]
      [  219.187277]  [<ffffffff8168f2d0>] SyS_accept+0x10/0x20
      
      Fixes: 860fbbc3 ("sctp: prepare for socket backlog behavior change")
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eefc1b1d
  23. 19 7月, 2016 1 次提交
  24. 14 7月, 2016 2 次提交
  25. 04 6月, 2016 2 次提交
    • M
      sctp: Add GSO support · 90017acc
      Marcelo Ricardo Leitner 提交于
      SCTP has this pecualiarity that its packets cannot be just segmented to
      (P)MTU. Its chunks must be contained in IP segments, padding respected.
      So we can't just generate a big skb, set gso_size to the fragmentation
      point and deliver it to IP layer.
      
      This patch takes a different approach. SCTP will now build a skb as it
      would be if it was received using GRO. That is, there will be a cover
      skb with protocol headers and children ones containing the actual
      segments, already segmented to a way that respects SCTP RFCs.
      
      With that, we can tell skb_segment() to just split based on frag_list,
      trusting its sizes are already in accordance.
      
      This way SCTP can benefit from GSO and instead of passing several
      packets through the stack, it can pass a single large packet.
      
      v2:
      - Added support for receiving GSO frames, as requested by Dave Miller.
      - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
      - Added heuristics similar to what we have in TCP for not generating
        single GSO packets that fills cwnd.
      v3:
      - consider sctphdr size in skb_gso_transport_seglen()
      - rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
        unsupported GSO")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90017acc
    • M
      sctp: delay as much as possible skb_linearize · 3acb50c1
      Marcelo Ricardo Leitner 提交于
      This patch is a preparation for the GSO one. In order to successfully
      handle GSO packets on rx path we must not call skb_linearize, otherwise
      it defeats any gain GSO may have had.
      
      This patch thus delays as much as possible the call to skb_linearize,
      leaving it to sctp_inq_pop() moment. For that the sanity checks
      performed now know how to deal with fragments.
      
      One positive side-effect of this is that if the socket is backlogged it
      will have the chance of doing it on backlog processing instead of
      during softirq.
      
      With this move, it's evident that a check for non-linearity in
      sctp_inq_pop was ineffective and is now removed. Note that a similar
      check is performed a bit below this one.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3acb50c1
  26. 28 4月, 2016 3 次提交
  27. 21 3月, 2016 1 次提交
    • M
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner 提交于
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3822a5ff
  28. 14 3月, 2016 1 次提交
    • M
      sctp: allow sctp_transmit_packet and others to use gfp · cea8768f
      Marcelo Ricardo Leitner 提交于
      Currently sctp_sendmsg() triggers some calls that will allocate memory
      with GFP_ATOMIC even when not necessary. In the case of
      sctp_packet_transmit it will allocate a linear skb that will be used to
      construct the packet and this may cause sends to fail due to ENOMEM more
      often than anticipated specially with big MTUs.
      
      This patch thus allows it to inherit gfp flags from upper calls so that
      it can use GFP_KERNEL if it was triggered by a sctp_sendmsg call or
      similar. All others, like retransmits or flushes started from BH, are
      still allocated using GFP_ATOMIC.
      
      In netperf tests this didn't result in any performance drawbacks when
      memory is not too fragmented and made it trigger ENOMEM way less often.
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea8768f
  29. 18 2月, 2016 1 次提交
  30. 29 1月, 2016 1 次提交
  31. 18 1月, 2016 1 次提交
  32. 16 1月, 2016 1 次提交
  33. 06 1月, 2016 1 次提交