1. 16 7月, 2017 1 次提交
    • A
      sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() · b1f5bfc2
      Alexander Potapenko 提交于
      If the length field of the iterator (|pos.p| or |err|) is past the end
      of the chunk, we shouldn't access it.
      
      This bug has been detected by KMSAN. For the following pair of system
      calls:
      
        socket(PF_INET6, SOCK_STREAM, 0x84 /* IPPROTO_??? */) = 3
        sendto(3, "A", 1, MSG_OOB, {sa_family=AF_INET6, sin6_port=htons(0),
               inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0,
               sin6_scope_id=0}, 28) = 1
      
      the tool has reported a use of uninitialized memory:
      
        ==================================================================
        BUG: KMSAN: use of uninitialized memory in sctp_rcv+0x17b8/0x43b0
        CPU: 1 PID: 2940 Comm: probe Not tainted 4.11.0-rc5+ #2926
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
        01/01/2011
        Call Trace:
         <IRQ>
         __dump_stack lib/dump_stack.c:16
         dump_stack+0x172/0x1c0 lib/dump_stack.c:52
         kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:927
         __msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:469
         __sctp_rcv_init_lookup net/sctp/input.c:1074
         __sctp_rcv_lookup_harder net/sctp/input.c:1233
         __sctp_rcv_lookup net/sctp/input.c:1255
         sctp_rcv+0x17b8/0x43b0 net/sctp/input.c:170
         sctp6_rcv+0x32/0x70 net/sctp/ipv6.c:984
         ip6_input_finish+0x82f/0x1ee0 net/ipv6/ip6_input.c:279
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_input+0x239/0x290 net/ipv6/ip6_input.c:322
         dst_input ./include/net/dst.h:492
         ip6_rcv_finish net/ipv6/ip6_input.c:69
         NF_HOOK ./include/linux/netfilter.h:257
         ipv6_rcv+0x1dbd/0x22e0 net/ipv6/ip6_input.c:203
         __netif_receive_skb_core+0x2f6f/0x3a20 net/core/dev.c:4208
         __netif_receive_skb net/core/dev.c:4246
         process_backlog+0x667/0xba0 net/core/dev.c:4866
         napi_poll net/core/dev.c:5268
         net_rx_action+0xc95/0x1590 net/core/dev.c:5333
         __do_softirq+0x485/0x942 kernel/softirq.c:284
         do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
         </IRQ>
         do_softirq kernel/softirq.c:328
         __local_bh_enable_ip+0x25b/0x290 kernel/softirq.c:181
         local_bh_enable+0x37/0x40 ./include/linux/bottom_half.h:31
         rcu_read_unlock_bh ./include/linux/rcupdate.h:931
         ip6_finish_output2+0x19b2/0x1cf0 net/ipv6/ip6_output.c:124
         ip6_finish_output+0x764/0x970 net/ipv6/ip6_output.c:149
         NF_HOOK_COND ./include/linux/netfilter.h:246
         ip6_output+0x456/0x520 net/ipv6/ip6_output.c:163
         dst_output ./include/net/dst.h:486
         NF_HOOK ./include/linux/netfilter.h:257
         ip6_xmit+0x1841/0x1c00 net/ipv6/ip6_output.c:261
         sctp_v6_xmit+0x3b7/0x470 net/sctp/ipv6.c:225
         sctp_packet_transmit+0x38cb/0x3a20 net/sctp/output.c:632
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         entry_SYSCALL64_slow_path+0x25/0x25 arch/x86/entry/entry_64.S:246
        RIP: 0033:0x401133
        RSP: 002b:00007fff6d99cd38 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
        RAX: ffffffffffffffda RBX: 00000000004002b0 RCX: 0000000000401133
        RDX: 0000000000000001 RSI: 0000000000494088 RDI: 0000000000000003
        RBP: 00007fff6d99cd90 R08: 00007fff6d99cd50 R09: 000000000000001c
        R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
        R13: 00000000004063d0 R14: 0000000000406460 R15: 0000000000000000
        origin:
         save_stack_trace+0x37/0x40 arch/x86/kernel/stacktrace.c:59
         kmsan_save_stack_with_flags mm/kmsan/kmsan.c:302
         kmsan_internal_poison_shadow+0xb1/0x1a0 mm/kmsan/kmsan.c:198
         kmsan_poison_shadow+0x6d/0xc0 mm/kmsan/kmsan.c:211
         slab_alloc_node mm/slub.c:2743
         __kmalloc_node_track_caller+0x200/0x360 mm/slub.c:4351
         __kmalloc_reserve net/core/skbuff.c:138
         __alloc_skb+0x26b/0x840 net/core/skbuff.c:231
         alloc_skb ./include/linux/skbuff.h:933
         sctp_packet_transmit+0x31e/0x3a20 net/sctp/output.c:570
         sctp_outq_flush+0xeb3/0x46e0 net/sctp/outqueue.c:885
         sctp_outq_uncork+0xb2/0xd0 net/sctp/outqueue.c:750
         sctp_side_effects net/sctp/sm_sideeffect.c:1773
         sctp_do_sm+0x6962/0x6ec0 net/sctp/sm_sideeffect.c:1147
         sctp_primitive_ASSOCIATE+0x12c/0x160 net/sctp/primitive.c:88
         sctp_sendmsg+0x43e5/0x4f90 net/sctp/socket.c:1954
         inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
         sock_sendmsg_nosec net/socket.c:633
         sock_sendmsg net/socket.c:643
         SYSC_sendto+0x608/0x710 net/socket.c:1696
         SyS_sendto+0x8a/0xb0 net/socket.c:1664
         do_syscall_64+0xe6/0x130 arch/x86/entry/common.c:285
         return_from_SYSCALL_64+0x0/0x6a arch/x86/entry/entry_64.S:246
        ==================================================================
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1f5bfc2
  2. 02 7月, 2017 2 次提交
  3. 05 4月, 2017 1 次提交
  4. 04 4月, 2017 1 次提交
    • X
      sctp: check for dst and pathmtu update in sctp_packet_config · df2729c3
      Xin Long 提交于
      This patch is to move sctp_transport_dst_check into sctp_packet_config
      from sctp_packet_transmit and add pathmtu check in sctp_packet_config.
      
      With this fix, sctp can update dst or pathmtu before appending chunks,
      which can void dropping packets in sctp_packet_transmit when dst is
      obsolete or dst's mtu is changed.
      
      This patch is also to improve some other codes in sctp_packet_config.
      It updates packet max_size with gso_max_size, checks for dst and
      pathmtu, and appends ecne chunk only when packet is empty and asoc
      is not NULL.
      
      It makes sctp flush work better, as we only need to set up them once
      for one flush schedule. It's also safe, since asoc is NULL only when
      the packet is created by sctp_ootb_pkt_new in which it just gets the
      new dst, no need to do more things for it other than set packet with
      transport's pathmtu.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df2729c3
  5. 20 2月, 2017 1 次提交
    • X
      sctp: sctp_transport_dst_check should check if transport pmtu is dst mtu · a4d69a4c
      Xin Long 提交于
      Now when sending a packet, sctp_transport_dst_check will check if dst
      is obsolete by calling ipv4/ip6_dst_check. But they return obsolete
      only when adding a new cache, after that when the cache's pmtu is
      updated again, it will not trigger transport->dst/pmtu's update.
      
      It can be reproduced by reducing route's pmtu twice. At the 1st time
      client will add a new cache, and transport->pathmtu gets updated as
      sctp_transport_dst_check finds it's obsolete. But at the 2nd time,
      cache's mtu is updated, sctp client will never send out any packet,
      because transport->pmtu has no chance to update.
      
      This patch is to fix this by also checking if transport pmtu is dst
      mtu in sctp_transport_dst_check, so that transport->pmtu can be
      updated on time.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a4d69a4c
  6. 10 2月, 2017 2 次提交
  7. 08 2月, 2017 1 次提交
  8. 19 1月, 2017 2 次提交
  9. 07 1月, 2017 1 次提交
    • X
      sctp: prepare asoc stream for stream reconf · a8386317
      Xin Long 提交于
      sctp stream reconf, described in RFC 6525, needs a structure to
      save per stream information in assoc, like stream state.
      
      In the future, sctp stream scheduler also needs it to save some
      stream scheduler params and queues.
      
      This patchset is to prepare the stream array in assoc for stream
      reconf. It defines sctp_stream that includes stream arrays inside
      to replace ssnmap.
      
      Note that we use different structures for IN and OUT streams, as
      the members in per OUT stream will get more and more different
      from per IN stream.
      
      v1->v2:
        - put these patches into a smaller group.
      v2->v3:
        - define sctp_stream to contain stream arrays, and create stream.c
          to put stream-related functions.
        - merge 3 patches into 1, as new sctp_stream has the same name
          with before.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8386317
  10. 25 12月, 2016 1 次提交
  11. 17 11月, 2016 1 次提交
    • X
      sctp: use new rhlist interface on sctp transport rhashtable · 7fda702f
      Xin Long 提交于
      Now sctp transport rhashtable uses hash(lport, dport, daddr) as the key
      to hash a node to one chain. If in one host thousands of assocs connect
      to one server with the same lport and different laddrs (although it's
      not a normal case), all the transports would be hashed into the same
      chain.
      
      It may cause to keep returning -EBUSY when inserting a new node, as the
      chain is too long and sctp inserts a transport node in a loop, which
      could even lead to system hangs there.
      
      The new rhlist interface works for this case that there are many nodes
      with the same key in one chain. It puts them into a list then makes this
      list be as a node of the chain.
      
      This patch is to replace rhashtable_ interface with rhltable_ interface.
      Since a chain would not be too long and it would not return -EBUSY with
      this fix when inserting a node, the reinsert loop is also removed here.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7fda702f
  12. 01 11月, 2016 1 次提交
    • X
      sctp: hold transport instead of assoc when lookup assoc in rx path · dae399d7
      Xin Long 提交于
      Prior to this patch, in rx path, before calling lock_sock, it needed to
      hold assoc when got it by __sctp_lookup_association, in case other place
      would free/put assoc.
      
      But in __sctp_lookup_association, it lookup and hold transport, then got
      assoc by transport->assoc, then hold assoc and put transport. It means
      it didn't hold transport, yet it was returned and later on directly
      assigned to chunk->transport.
      
      Without the protection of sock lock, the transport may be freed/put by
      other places, which would cause a use-after-free issue.
      
      This patch is to fix this issue by holding transport instead of assoc.
      As holding transport can make sure to access assoc is also safe, and
      actually it looks up assoc by searching transport rhashtable, to hold
      transport here makes more sense.
      
      Note that the function will be renamed later on on another patch.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dae399d7
  13. 22 9月, 2016 1 次提交
  14. 04 6月, 2016 1 次提交
    • M
      sctp: Add GSO support · 90017acc
      Marcelo Ricardo Leitner 提交于
      SCTP has this pecualiarity that its packets cannot be just segmented to
      (P)MTU. Its chunks must be contained in IP segments, padding respected.
      So we can't just generate a big skb, set gso_size to the fragmentation
      point and deliver it to IP layer.
      
      This patch takes a different approach. SCTP will now build a skb as it
      would be if it was received using GRO. That is, there will be a cover
      skb with protocol headers and children ones containing the actual
      segments, already segmented to a way that respects SCTP RFCs.
      
      With that, we can tell skb_segment() to just split based on frag_list,
      trusting its sizes are already in accordance.
      
      This way SCTP can benefit from GSO and instead of passing several
      packets through the stack, it can pass a single large packet.
      
      v2:
      - Added support for receiving GSO frames, as requested by Dave Miller.
      - Clear skb->cb if packet is GSO (otherwise it's not used by SCTP)
      - Added heuristics similar to what we have in TCP for not generating
        single GSO packets that fills cwnd.
      v3:
      - consider sctphdr size in skb_gso_transport_seglen()
      - rebased due to 5c7cdf33 ("gso: Remove arbitrary checks for
        unsupported GSO")
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Tested-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90017acc
  15. 28 4月, 2016 3 次提交
  16. 16 4月, 2016 3 次提交
    • X
      sctp: export some apis or variables for sctp_diag and reuse some for proc · 626d16f5
      Xin Long 提交于
      For some main variables in sctp.ko, we couldn't export it to other modules,
      so we have to define some api to access them.
      
      It will include sctp transport and endpoint's traversal.
      
      There are some transport traversal functions for sctp_diag, we can also
      use it for sctp_proc. cause they have the similar situation to traversal
      transport.
      
      v2->v3:
      - rhashtable_walk_init need the parameter gfp, because of recent upstrem
        update
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      626d16f5
    • X
      sctp: add sctp_info dump api for sctp_diag · 52c52a61
      Xin Long 提交于
      sctp_diag will dump some important details of sctp's assoc or ep, we use
      sctp_info to describe them,  sctp_get_sctp_info to get them, and export
      it to sctp_diag.ko.
      
      v2->v3:
      - we will not use list_for_each_safe in sctp_get_sctp_info, cause
        all the callers of it will use lock_sock.
      
      - fix the holes in struct sctp_info with __reserved* field.
        because sctp_diag is a new feature, and sctp_info is just for now,
        it may be changed in the future.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52c52a61
    • M
      sctp: simplify sk_receive_queue locking · 311b2177
      Marcelo Ricardo Leitner 提交于
      SCTP already serializes access to rcvbuf through its sock lock:
      sctp_recvmsg takes it right in the start and release at the end, while
      rx path will also take the lock before doing any socket processing. On
      sctp_rcv() it will check if there is an user using the socket and, if
      there is, it will queue incoming packets to the backlog. The backlog
      processing will do the same. Even timers will do such check and
      re-schedule if an user is using the socket.
      
      Simplifying this will allow us to remove sctp_skb_list_tail and get ride
      of some expensive lockings.  The lists that it is used on are also
      mangled with functions like __skb_queue_tail and __skb_unlink in the
      same context, like on sctp_ulpq_tail_event() and sctp_clear_pd().
      sctp_close() will also purge those while using only the sock lock.
      
      Therefore the lockings performed by sctp_skb_list_tail() are not
      necessary. This patch removes this function and replaces its calls with
      just skb_queue_splice_tail_init() instead.
      
      The biggest gain is at sctp_ulpq_tail_event(), because the events always
      contain a list, even if it's queueing a single skb and this was
      triggering expensive calls to spin_lock_irqsave/_irqrestore for every
      data chunk received.
      
      As SCTP will deliver each data chunk on a corresponding recvmsg, the
      more effective the change will be.
      Before this patch, with chunks with 30 bytes:
      netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000
      400000 -s 400000 400000
      on a 10Gbit link with 1500 MTU:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       137.45   7.34     7.36     52.504  52.608
      
      With it:
      
      SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET
      Recv   Send    Send                          Utilization       Service Demand
      Socket Socket  Message  Elapsed              Send     Recv     Send    Recv
      Size   Size    Size     Time     Throughput  local    remote   local   remote
      bytes  bytes   bytes    secs.    10^6bits/s  % S      % S      us/KB   us/KB
      
      425984 425984     30    60.00       179.10   7.97     6.70     43.740  36.788
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      311b2177
  17. 06 4月, 2016 1 次提交
  18. 21 3月, 2016 2 次提交
    • M
      sctp: keep fragmentation point aligned to word size · 659e0bca
      Marcelo Ricardo Leitner 提交于
      If the user supply a different fragmentation point or if there is a
      network header that cause it to not be aligned, force it to be aligned.
      
      Fragmentation point at a value that is not aligned is not optimal.  It
      causes extra padding to be used and has just no pros.
      
      v2:
       - Make use of the new WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      659e0bca
    • M
      sctp: align MTU to a word · 3822a5ff
      Marcelo Ricardo Leitner 提交于
      SCTP is a protocol that is aligned to a word (4 bytes). Thus using bare
      MTU can sometimes return values that are not aligned, like for loopback,
      which is 65536 but ipv4_mtu() limits that to 65535. This mis-alignment
      will cause the last non-aligned bytes to never be used and can cause
      issues with congestion control.
      
      So it's better to just consider a lower MTU and keep congestion control
      calcs saner as they are based on PMTU.
      
      Same applies to icmp frag needed messages, which is also fixed by this
      patch.
      
      One other effect of this is the inability to send MTU-sized packet
      without queueing or fragmentation and without hitting Nagle. As the
      check performed at sctp_packet_can_append_data():
      
      if (chunk->skb->len + q->out_qlen >= transport->pathmtu - packet->overhead)
      	/* Enough data queued to fill a packet */
      	return SCTP_XMIT_OK;
      
      with the above example of MTU, if there are no other messages queued,
      one cannot send a packet that just fits one packet (65532 bytes) and
      without causing DATA chunk fragmentation or a delay.
      
      v2:
       - Added WORD_TRUNC macro
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3822a5ff
  19. 06 1月, 2016 2 次提交
  20. 28 5月, 2015 1 次提交
  21. 25 3月, 2015 1 次提交
  22. 15 10月, 2014 1 次提交
    • D
      net: sctp: fix panic on duplicate ASCONF chunks · b69040d8
      Daniel Borkmann 提交于
      When receiving a e.g. semi-good formed connection scan in the
      form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---------------- ASCONF_a; ASCONF_b ----------------->
      
      ... where ASCONF_a equals ASCONF_b chunk (at least both serials
      need to be equal), we panic an SCTP server!
      
      The problem is that good-formed ASCONF chunks that we reply with
      ASCONF_ACK chunks are cached per serial. Thus, when we receive a
      same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
      not need to process them again on the server side (that was the
      idea, also proposed in the RFC). Instead, we know it was cached
      and we just resend the cached chunk instead. So far, so good.
      
      Where things get nasty is in SCTP's side effect interpreter, that
      is, sctp_cmd_interpreter():
      
      While incoming ASCONF_a (chunk = event_arg) is being marked
      !end_of_packet and !singleton, and we have an association context,
      we do not flush the outqueue the first time after processing the
      ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
      queued up, although we set local_cork to 1. Commit 2e3216cd
      changed the precedence, so that as long as we get bundled, incoming
      chunks we try possible bundling on outgoing queue as well. Before
      this commit, we would just flush the output queue.
      
      Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
      continue to process the same ASCONF_b chunk from the packet. As
      we have cached the previous ASCONF_ACK, we find it, grab it and
      do another SCTP_CMD_REPLY command on it. So, effectively, we rip
      the chunk->list pointers and requeue the same ASCONF_ACK chunk
      another time. Since we process ASCONF_b, it's correctly marked
      with end_of_packet and we enforce an uncork, and thus flush, thus
      crashing the kernel.
      
      Fix it by testing if the ASCONF_ACK is currently pending and if
      that is the case, do not requeue it. When flushing the output
      queue we may relink the chunk for preparing an outgoing packet,
      but eventually unlink it when it's copied into the skb right
      before transmission.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b69040d8
  23. 30 8月, 2014 1 次提交
    • D
      net: sctp: fix ABI mismatch through sctp_assoc_to_state helper · 38ab1fa9
      Daniel Borkmann 提交于
      Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
      tree, the official <netinet/sctp.h> header carries a copy of enum
      sctp_sstat_state that looks like (compared to the current in-kernel
      enumeration):
      
        User definition:                     Kernel definition:
      
        enum sctp_sstat_state {              typedef enum {
          SCTP_EMPTY             = 0,          <removed>
          SCTP_CLOSED            = 1,          SCTP_STATE_CLOSED            = 0,
          SCTP_COOKIE_WAIT       = 2,          SCTP_STATE_COOKIE_WAIT       = 1,
          SCTP_COOKIE_ECHOED     = 3,          SCTP_STATE_COOKIE_ECHOED     = 2,
          SCTP_ESTABLISHED       = 4,          SCTP_STATE_ESTABLISHED       = 3,
          SCTP_SHUTDOWN_PENDING  = 5,          SCTP_STATE_SHUTDOWN_PENDING  = 4,
          SCTP_SHUTDOWN_SENT     = 6,          SCTP_STATE_SHUTDOWN_SENT     = 5,
          SCTP_SHUTDOWN_RECEIVED = 7,          SCTP_STATE_SHUTDOWN_RECEIVED = 6,
          SCTP_SHUTDOWN_ACK_SENT = 8,          SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
        };                                   } sctp_state_t;
      
      This header was later on also placed into the uapi, so that user space
      programs can compile without having <netinet/sctp.h>, but the shipped
      with <linux/sctp.h> instead.
      
      While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
      sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
      nevertheless have a what it appears to be dummy SCTP_EMPTY state from
      the very early days.
      
      While it seems to do just nothing, commit 0b8f9e25 ("sctp: remove
      completely unsed EMPTY state") did the right thing and removed this dead
      code. That however, causes an off-by-one when the user asks the SCTP
      stack via SCTP_STATUS API and checks for the current socket state thus
      yielding possibly undefined behaviour in applications as they expect
      the kernel to tell the right thing.
      
      The enumeration had to be changed however as based on the current socket
      state, we access a function pointer lookup-table through this. Therefore,
      I think the best way to deal with this is just to add a helper function
      sctp_assoc_to_state() to encapsulate the off-by-one quirk.
      Reported-by: NTristan Su <sooqing@gmail.com>
      Fixes: 0b8f9e25 ("sctp: remove completely unsed EMPTY state")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38ab1fa9
  24. 01 8月, 2014 1 次提交
    • J
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe 提交于
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299ee123
  25. 17 7月, 2014 1 次提交
    • G
      net: sctp: implement rfc6458, 5.3.6. SCTP_NXTINFO cmsg support · 2347c80f
      Geir Ola Vaagland 提交于
      This patch implements section 5.3.6. of RFC6458, that is, support
      for 'SCTP Next Receive Information Structure' (SCTP_NXTINFO) which
      is placed into ancillary data cmsghdr structure for each recvmsg()
      call, if this information is already available when delivering the
      current message.
      
      This option can be enabled/disabled via setsockopt(2) on SOL_SCTP
      level by setting an int value with 1/0 for SCTP_RECVNXTINFO in
      user space applications as per RFC6458, section 8.1.30.
      
      The sctp_nxtinfo structure is defined as per RFC as below ...
      
        struct sctp_nxtinfo {
          uint16_t nxt_sid;
          uint16_t nxt_flags;
          uint32_t nxt_ppid;
          uint32_t nxt_length;
          sctp_assoc_t nxt_assoc_id;
        };
      
      ... and provided under cmsg_level IPPROTO_SCTP, cmsg_type
      SCTP_NXTINFO, while cmsg_data[] contains struct sctp_nxtinfo.
      
      Joint work with Daniel Borkmann.
      Signed-off-by: NGeir Ola Vaagland <geirola@gmail.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2347c80f
  26. 03 7月, 2014 1 次提交
    • D
      net: sctp: improve timer slack calculation for transport HBs · 8f61059a
      Daniel Borkmann 提交于
      RFC4960, section 8.3 says:
      
        On an idle destination address that is allowed to heartbeat,
        it is recommended that a HEARTBEAT chunk is sent once per RTO
        of that destination address plus the protocol parameter
        'HB.interval', with jittering of +/- 50% of the RTO value,
        and exponential backoff of the RTO if the previous HEARTBEAT
        is unanswered.
      
      Currently, we calculate jitter via sctp_jitter() function first,
      and then add its result to the current RTO for the new timeout:
      
        TMO = RTO + (RAND() % RTO) - (RTO / 2)
                    `------------------------^-=> sctp_jitter()
      
      Instead, we can just simplify all this by directly calculating:
      
        TMO = (RTO / 2) + (RAND() % RTO)
      
      With the help of prandom_u32_max(), we don't need to open code
      our own global PRNG, but can instead just make use of the per
      CPU implementation of prandom with better quality numbers. Also,
      we can now spare us the conditional for divide by zero check
      since no div or mod operation needs to be used. Note that
      prandom_u32_max() won't emit the same result as a mod operation,
      but we really don't care here as we only want to have a random
      number scaled into RTO interval.
      
      Note, exponential RTO backoff is handeled elsewhere, namely in
      sctp_do_8_2_transport_strike().
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f61059a
  27. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  28. 22 1月, 2014 4 次提交