1. 30 6月, 2015 1 次提交
    • A
      sctp: Fix race between OOTB responce and route removal · 29c4afc4
      Alexander Sverdlin 提交于
      There is NULL pointer dereference possible during statistics update if the route
      used for OOTB responce is removed at unfortunate time. If the route exists when
      we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
      ABORT, but in the meantime route is removed under our feet, we take "no_route"
      path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).
      
      But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
      sctp_transport_set_owner() and therefore there is no asoc associated with this
      packet. Probably temporary asoc just for OOTB responces is overkill, so just
      introduce a check like in all other places in sctp_packet_transmit(), where
      "asoc" is dereferenced.
      
      To reproduce this, one needs to
      0. ensure that sctp module is loaded (otherwise ABORT is not generated)
      1. remove default route on the machine
      2. while true; do
           ip route del [interface-specific route]
           ip route add [interface-specific route]
         done
      3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
         responce
      
      On x86_64 the crash looks like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
      IP: [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      PGD 0
      Oops: 0000 [#1] PREEMPT SMP
      Modules linked in: ...
      CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-1-ARCH #1
      Hardware name: ...
      task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
      RIP: 0010:[<ffffffffa05ec9ac>]  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
      RSP: 0018:ffff880127c037b8  EFLAGS: 00010296
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
      RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
      RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
      R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
      R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
      FS:  0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
      Stack:
       ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
       ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
       0000000000000000 0000000000000001 0000000000000000 0000000000000000
      Call Trace:
       <IRQ>
       [<ffffffffa05c94c5>] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
       [<ffffffffa05d6b42>] ? sctp_transport_put+0x52/0x80 [sctp]
       [<ffffffffa05d0bfc>] sctp_do_sm+0xb8c/0x19a0 [sctp]
       [<ffffffff810b0e00>] ? trigger_load_balance+0x90/0x210
       [<ffffffff810e0329>] ? update_process_times+0x59/0x60
       [<ffffffff812c7a40>] ? timerqueue_add+0x60/0xb0
       [<ffffffff810e0549>] ? enqueue_hrtimer+0x29/0xa0
       [<ffffffff8101f599>] ? read_tsc+0x9/0x10
       [<ffffffff8116d4b5>] ? put_page+0x55/0x60
       [<ffffffff810ee1ad>] ? clockevents_program_event+0x6d/0x100
       [<ffffffff81462b68>] ? skb_free_head+0x58/0x80
       [<ffffffffa029a10b>] ? chksum_update+0x1b/0x27 [crc32c_generic]
       [<ffffffff81283f3e>] ? crypto_shash_update+0xce/0xf0
       [<ffffffffa05d3993>] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
       [<ffffffffa05dd4e6>] sctp_inq_push+0x46/0x60 [sctp]
       [<ffffffffa05ed7a0>] sctp_rcv+0x880/0x910 [sctp]
       [<ffffffffa05ecb50>] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
       [<ffffffffa05ecb70>] ? sctp_csum_update+0x20/0x20 [sctp]
       [<ffffffff814b05a5>] ? ip_route_input_noref+0x235/0xd30
       [<ffffffff81051d6b>] ? ack_ioapic_level+0x7b/0x150
       [<ffffffff814b27be>] ip_local_deliver_finish+0xae/0x210
       [<ffffffff814b2e15>] ip_local_deliver+0x35/0x90
       [<ffffffff814b2a15>] ip_rcv_finish+0xf5/0x370
       [<ffffffff814b3128>] ip_rcv+0x2b8/0x3a0
       [<ffffffff81474193>] __netif_receive_skb_core+0x763/0xa50
       [<ffffffff81476c28>] __netif_receive_skb+0x18/0x60
       [<ffffffff81476cb0>] netif_receive_skb_internal+0x40/0xd0
       [<ffffffff814776c8>] napi_gro_receive+0xe8/0x120
       [<ffffffffa03946aa>] rtl8169_poll+0x2da/0x660 [r8169]
       [<ffffffff8147896a>] net_rx_action+0x21a/0x360
       [<ffffffff81078dc1>] __do_softirq+0xe1/0x2d0
       [<ffffffff8107912d>] irq_exit+0xad/0xb0
       [<ffffffff8157d158>] do_IRQ+0x58/0xf0
       [<ffffffff8157b06d>] common_interrupt+0x6d/0x6d
       <EOI>
       [<ffffffff810e1218>] ? hrtimer_start+0x18/0x20
       [<ffffffffa05d65f9>] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
       [<ffffffff81020c50>] ? mwait_idle+0x60/0xa0
       [<ffffffff810216ef>] arch_cpu_idle+0xf/0x20
       [<ffffffff810b731c>] cpu_startup_entry+0x3ec/0x480
       [<ffffffff8156b365>] rest_init+0x85/0x90
       [<ffffffff818eb035>] start_kernel+0x48b/0x4ac
       [<ffffffff818ea120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff818ea339>] x86_64_start_reservations+0x2a/0x2c
       [<ffffffff818ea49c>] x86_64_start_kernel+0x161/0x184
      Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 <48> 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
      RIP  [<ffffffffa05ec9ac>] sctp_packet_transmit+0x63c/0x730 [sctp]
       RSP <ffff880127c037b8>
      CR2: 0000000000000020
      ---[ end trace 5aec7fd2dc983574 ]---
      Kernel panic - not syncing: Fatal exception in interrupt
      Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
      drm_kms_helper: panic occurred, switching back to text console
      ---[ end Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NAlexander Sverdlin <alexander.sverdlin@nokia.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29c4afc4
  2. 29 6月, 2015 1 次提交
  3. 15 6月, 2015 1 次提交
    • M
      sctp: fix ASCONF list handling · 2d45a02d
      Marcelo Ricardo Leitner 提交于
      ->auto_asconf_splist is per namespace and mangled by functions like
      sctp_setsockopt_auto_asconf() which doesn't guarantee any serialization.
      
      Also, the call to inet_sk_copy_descendant() was backuping
      ->auto_asconf_list through the copy but was not honoring
      ->do_auto_asconf, which could lead to list corruption if it was
      different between both sockets.
      
      This commit thus fixes the list handling by using ->addr_wq_lock
      spinlock to protect the list. A special handling is done upon socket
      creation and destruction for that. Error handlig on sctp_init_sock()
      will never return an error after having initialized asconf, so
      sctp_destroy_sock() can be called without addrq_wq_lock. The lock now
      will be take on sctp_close_sock(), before locking the socket, so we
      don't do it in inverse order compared to sctp_addr_wq_timeout_handler().
      
      Instead of taking the lock on sctp_sock_migrate() for copying and
      restoring the list values, it's preferred to avoid rewritting it by
      implementing sctp_copy_descendant().
      
      Issue was found with a test application that kept flipping sysctl
      default_auto_asconf on and off, but one could trigger it by issuing
      simultaneous setsockopt() calls on multiple sockets or by
      creating/destroying sockets fast enough. This is only triggerable
      locally.
      
      Fixes: 9f7d653b ("sctp: Add Auto-ASCONF support (core).")
      Reported-by: NJi Jianwen <jiji@redhat.com>
      Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
      Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d45a02d
  4. 13 6月, 2015 1 次提交
  5. 26 5月, 2015 2 次提交
  6. 11 5月, 2015 1 次提交
  7. 25 3月, 2015 1 次提交
  8. 03 3月, 2015 1 次提交
  9. 02 3月, 2015 1 次提交
  10. 03 2月, 2015 1 次提交
  11. 31 1月, 2015 1 次提交
  12. 27 1月, 2015 1 次提交
    • D
      net: sctp: fix slab corruption from use after free on INIT collisions · 600ddd68
      Daniel Borkmann 提交于
      When hitting an INIT collision case during the 4WHS with AUTH enabled, as
      already described in detail in commit 1be9a950 ("net: sctp: inherit
      auth_capable on INIT collisions"), it can happen that we occasionally
      still remotely trigger the following panic on server side which seems to
      have been uncovered after the fix from commit 1be9a950 ...
      
      [  533.876389] BUG: unable to handle kernel paging request at 00000000ffffffff
      [  533.913657] IP: [<ffffffff811ac385>] __kmalloc+0x95/0x230
      [  533.940559] PGD 5030f2067 PUD 0
      [  533.957104] Oops: 0000 [#1] SMP
      [  533.974283] Modules linked in: sctp mlx4_en [...]
      [  534.939704] Call Trace:
      [  534.951833]  [<ffffffff81294e30>] ? crypto_init_shash_ops+0x60/0xf0
      [  534.984213]  [<ffffffff81294e30>] crypto_init_shash_ops+0x60/0xf0
      [  535.015025]  [<ffffffff8128c8ed>] __crypto_alloc_tfm+0x6d/0x170
      [  535.045661]  [<ffffffff8128d12c>] crypto_alloc_base+0x4c/0xb0
      [  535.074593]  [<ffffffff8160bd42>] ? _raw_spin_lock_bh+0x12/0x50
      [  535.105239]  [<ffffffffa0418c11>] sctp_inet_listen+0x161/0x1e0 [sctp]
      [  535.138606]  [<ffffffff814e43bd>] SyS_listen+0x9d/0xb0
      [  535.166848]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      ... or depending on the the application, for example this one:
      
      [ 1370.026490] BUG: unable to handle kernel paging request at 00000000ffffffff
      [ 1370.026506] IP: [<ffffffff811ab455>] kmem_cache_alloc+0x75/0x1d0
      [ 1370.054568] PGD 633c94067 PUD 0
      [ 1370.070446] Oops: 0000 [#1] SMP
      [ 1370.085010] Modules linked in: sctp kvm_amd kvm [...]
      [ 1370.963431] Call Trace:
      [ 1370.974632]  [<ffffffff8120f7cf>] ? SyS_epoll_ctl+0x53f/0x960
      [ 1371.000863]  [<ffffffff8120f7cf>] SyS_epoll_ctl+0x53f/0x960
      [ 1371.027154]  [<ffffffff812100d3>] ? anon_inode_getfile+0xd3/0x170
      [ 1371.054679]  [<ffffffff811e3d67>] ? __alloc_fd+0xa7/0x130
      [ 1371.080183]  [<ffffffff816149a9>] system_call_fastpath+0x16/0x1b
      
      With slab debugging enabled, we can see that the poison has been overwritten:
      
      [  669.826368] BUG kmalloc-128 (Tainted: G        W     ): Poison overwritten
      [  669.826385] INFO: 0xffff880228b32e50-0xffff880228b32e50. First byte 0x6a instead of 0x6b
      [  669.826414] INFO: Allocated in sctp_auth_create_key+0x23/0x50 [sctp] age=3 cpu=0 pid=18494
      [  669.826424]  __slab_alloc+0x4bf/0x566
      [  669.826433]  __kmalloc+0x280/0x310
      [  669.826453]  sctp_auth_create_key+0x23/0x50 [sctp]
      [  669.826471]  sctp_auth_asoc_create_secret+0xcb/0x1e0 [sctp]
      [  669.826488]  sctp_auth_asoc_init_active_key+0x68/0xa0 [sctp]
      [  669.826505]  sctp_do_sm+0x29d/0x17c0 [sctp] [...]
      [  669.826629] INFO: Freed in kzfree+0x31/0x40 age=1 cpu=0 pid=18494
      [  669.826635]  __slab_free+0x39/0x2a8
      [  669.826643]  kfree+0x1d6/0x230
      [  669.826650]  kzfree+0x31/0x40
      [  669.826666]  sctp_auth_key_put+0x19/0x20 [sctp]
      [  669.826681]  sctp_assoc_update+0x1ee/0x2d0 [sctp]
      [  669.826695]  sctp_do_sm+0x674/0x17c0 [sctp]
      
      Since this only triggers in some collision-cases with AUTH, the problem at
      heart is that sctp_auth_key_put() on asoc->asoc_shared_key is called twice
      when having refcnt 1, once directly in sctp_assoc_update() and yet again
      from within sctp_auth_asoc_init_active_key() via sctp_assoc_update() on
      the already kzfree'd memory, which is also consistent with the observation
      of the poison decrease from 0x6b to 0x6a (note: the overwrite is detected
      at a later point in time when poison is checked on new allocation).
      
      Reference counting of auth keys revisited:
      
      Shared keys for AUTH chunks are being stored in endpoints and associations
      in endpoint_shared_keys list. On endpoint creation, a null key is being
      added; on association creation, all endpoint shared keys are being cached
      and thus cloned over to the association. struct sctp_shared_key only holds
      a pointer to the actual key bytes, that is, struct sctp_auth_bytes which
      keeps track of users internally through refcounting. Naturally, on assoc
      or enpoint destruction, sctp_shared_key are being destroyed directly and
      the reference on sctp_auth_bytes dropped.
      
      User space can add keys to either list via setsockopt(2) through struct
      sctp_authkey and by passing that to sctp_auth_set_key() which replaces or
      adds a new auth key. There, sctp_auth_create_key() creates a new sctp_auth_bytes
      with refcount 1 and in case of replacement drops the reference on the old
      sctp_auth_bytes. A key can be set active from user space through setsockopt()
      on the id via sctp_auth_set_active_key(), which iterates through either
      endpoint_shared_keys and in case of an assoc, invokes (one of various places)
      sctp_auth_asoc_init_active_key().
      
      sctp_auth_asoc_init_active_key() computes the actual secret from local's
      and peer's random, hmac and shared key parameters and returns a new key
      directly as sctp_auth_bytes, that is asoc->asoc_shared_key, plus drops
      the reference if there was a previous one. The secret, which where we
      eventually double drop the ref comes from sctp_auth_asoc_set_secret() with
      intitial refcount of 1, which also stays unchanged eventually in
      sctp_assoc_update(). This key is later being used for crypto layer to
      set the key for the hash in crypto_hash_setkey() from sctp_auth_calculate_hmac().
      
      To close the loop: asoc->asoc_shared_key is freshly allocated secret
      material and independant of the sctp_shared_key management keeping track
      of only shared keys in endpoints and assocs. Hence, also commit 4184b2a7
      ("net: sctp: fix memory leak in auth key management") is independant of
      this bug here since it concerns a different layer (though same structures
      being used eventually). asoc->asoc_shared_key is reference dropped correctly
      on assoc destruction in sctp_association_free() and when active keys are
      being replaced in sctp_auth_asoc_init_active_key(), it always has a refcount
      of 1. Hence, it's freed prematurely in sctp_assoc_update(). Simple fix is
      to remove that sctp_auth_key_put() from there which fixes these panics.
      
      Fixes: 730fc3d0 ("[SCTP]: Implete SCTP-AUTH parameter processing")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      600ddd68
  13. 18 1月, 2015 1 次提交
    • D
      net: sctp: fix race for one-to-many sockets in sendmsg's auto associate · 2061dcd6
      Daniel Borkmann 提交于
      I.e. one-to-many sockets in SCTP are not required to explicitly
      call into connect(2) or sctp_connectx(2) prior to data exchange.
      Instead, they can directly invoke sendmsg(2) and the SCTP stack
      will automatically trigger connection establishment through 4WHS
      via sctp_primitive_ASSOCIATE(). However, this in its current
      implementation is racy: INIT is being sent out immediately (as
      it cannot be bundled anyway) and the rest of the DATA chunks are
      queued up for later xmit when connection is established, meaning
      sendmsg(2) will return successfully. This behaviour can result
      in an undesired side-effect that the kernel made the application
      think the data has already been transmitted, although none of it
      has actually left the machine, worst case even after close(2)'ing
      the socket.
      
      Instead, when the association from client side has been shut down
      e.g. first gracefully through SCTP_EOF and then close(2), the
      client could afterwards still receive the server's INIT_ACK due
      to a connection with higher latency. This INIT_ACK is then considered
      out of the blue and hence responded with ABORT as there was no
      alive assoc found anymore. This can be easily reproduced f.e.
      with sctp_test application from lksctp. One way to fix this race
      is to wait for the handshake to actually complete.
      
      The fix defers waiting after sctp_primitive_ASSOCIATE() and
      sctp_primitive_SEND() succeeded, so that DATA chunks cooked up
      from sctp_sendmsg() have already been placed into the output
      queue through the side-effect interpreter, and therefore can then
      be bundeled together with COOKIE_ECHO control chunks.
      
      strace from example application (shortened):
      
      socket(PF_INET, SOCK_SEQPACKET, IPPROTO_SCTP) = 3
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(1)=[{"hello", 5}], msg_controllen=0, msg_flags=0}, 0) = 5
      sendmsg(3, {msg_name(28)={sa_family=AF_INET, sin_port=htons(8888), sin_addr=inet_addr("192.168.1.115")},
                 msg_iov(0)=[], msg_controllen=48, {cmsg_len=48, cmsg_level=0x84 /* SOL_??? */, cmsg_type=, ...},
                 msg_flags=0}, 0) = 0 // graceful shutdown for SOCK_SEQPACKET via SCTP_EOF
      close(3) = 0
      
      tcpdump before patch (fooling the application):
      
      22:33:36.306142 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 3879023686] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3139201684]
      22:33:36.316619 IP 192.168.1.115.8888 > 192.168.1.114.41462: sctp (1) [INIT ACK] [init tag: 3345394793] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 3380109591]
      22:33:36.317600 IP 192.168.1.114.41462 > 192.168.1.115.8888: sctp (1) [ABORT]
      
      tcpdump after patch:
      
      14:28:58.884116 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [INIT] [init tag: 438593213] [rwnd: 106496] [OS: 10] [MIS: 65535] [init TSN: 3092969729]
      14:28:58.888414 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [INIT ACK] [init tag: 381429855] [rwnd: 106496] [OS: 10] [MIS: 10] [init TSN: 2141904492]
      14:28:58.888638 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [COOKIE ECHO] , (2) [DATA] (B)(E) [TSN: 3092969729] [...]
      14:28:58.893278 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [COOKIE ACK] , (2) [SACK] [cum ack 3092969729] [a_rwnd 106491] [#gap acks 0] [#dup tsns 0]
      14:28:58.893591 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969730] [...]
      14:28:59.096963 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969730] [a_rwnd 106496] [#gap acks 0] [#dup tsns 0]
      14:28:59.097086 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [DATA] (B)(E) [TSN: 3092969731] [...] , (2) [DATA] (B)(E) [TSN: 3092969732] [...]
      14:28:59.103218 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SACK] [cum ack 3092969732] [a_rwnd 106486] [#gap acks 0] [#dup tsns 0]
      14:28:59.103330 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN]
      14:28:59.107793 IP 192.168.1.115.8888 > 192.168.1.114.35846: sctp (1) [SHUTDOWN ACK]
      14:28:59.107890 IP 192.168.1.114.35846 > 192.168.1.115.8888: sctp (1) [SHUTDOWN COMPLETE]
      
      Looks like this bug is from the pre-git history museum. ;)
      
      Fixes: 08707d5482df ("lksctp-2_5_31-0_5_1.patch")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2061dcd6
  14. 11 12月, 2014 1 次提交
  15. 10 12月, 2014 2 次提交
  16. 24 11月, 2014 2 次提交
  17. 22 11月, 2014 1 次提交
    • D
      net: sctp: keep owned chunk in destructor_arg instead of skb->cb · f869c912
      Daniel Borkmann 提交于
      It's just silly to hold the skb destructor argument around inside
      skb->cb[] as we currently do in SCTP.
      
      Nowadays, we're sort of cheating on data accounting in the sense
      that due to commit 4c3a5bda ("sctp: Don't charge for data in
      sndbuf again when transmitting packet"), we orphan the skb already
      in the SCTP output path, i.e. giving back charged data memory, and
      use a different destructor only to make sure the sk doesn't vanish
      on skb destruction time. Thus, cb[] is still valid here as we
      operate within the SCTP layer. (It's generally actually a big
      candidate for future rework, imho.)
      
      However, storing the destructor in the cb[] can easily cause issues
      should an non sctp_packet_set_owner_w()'ed skb ever escape the SCTP
      layer, since cb[] may get overwritten by lower layers and thus can
      corrupt the chunk pointer. There are no such issues at present,
      but lets keep the chunk in destructor_arg, as this is the actual
      purpose for it.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f869c912
  18. 12 11月, 2014 3 次提交
    • D
      net: sctp: fix memory leak in auth key management · 4184b2a7
      Daniel Borkmann 提交于
      A very minimal and simple user space application allocating an SCTP
      socket, setting SCTP_AUTH_KEY setsockopt(2) on it and then closing
      the socket again will leak the memory containing the authentication
      key from user space:
      
      unreferenced object 0xffff8800837047c0 (size 16):
        comm "a.out", pid 2789, jiffies 4296954322 (age 192.258s)
        hex dump (first 16 bytes):
          01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff816d7e8e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811c88d8>] __kmalloc+0xe8/0x270
          [<ffffffffa0870c23>] sctp_auth_create_key+0x23/0x50 [sctp]
          [<ffffffffa08718b1>] sctp_auth_set_key+0xa1/0x140 [sctp]
          [<ffffffffa086b383>] sctp_setsockopt+0xd03/0x1180 [sctp]
          [<ffffffff815bfd94>] sock_common_setsockopt+0x14/0x20
          [<ffffffff815beb61>] SyS_setsockopt+0x71/0xd0
          [<ffffffff816e58a9>] system_call_fastpath+0x12/0x17
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      This is bad because of two things, we can bring down a machine from
      user space when auth_enable=1, but also we would leave security sensitive
      keying material in memory without clearing it after use. The issue is
      that sctp_auth_create_key() already sets the refcount to 1, but after
      allocation sctp_auth_set_key() does an additional refcount on it, and
      thus leaving it around when we free the socket.
      
      Fixes: 65b07e5d ("[SCTP]: API updates to suport SCTP-AUTH extensions.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4184b2a7
    • D
      net: sctp: fix NULL pointer dereference in af->from_addr_param on malformed packet · e40607cb
      Daniel Borkmann 提交于
      An SCTP server doing ASCONF will panic on malformed INIT ping-of-death
      in the form of:
      
        ------------ INIT[PARAM: SET_PRIMARY_IP] ------------>
      
      While the INIT chunk parameter verification dissects through many things
      in order to detect malformed input, it misses to actually check parameters
      inside of parameters. E.g. RFC5061, section 4.2.4 proposes a 'set primary
      IP address' parameter in ASCONF, which has as a subparameter an address
      parameter.
      
      So an attacker may send a parameter type other than SCTP_PARAM_IPV4_ADDRESS
      or SCTP_PARAM_IPV6_ADDRESS, param_type2af() will subsequently return 0
      and thus sctp_get_af_specific() returns NULL, too, which we then happily
      dereference unconditionally through af->from_addr_param().
      
      The trace for the log:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
      IP: [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
      PGD 0
      Oops: 0000 [#1] SMP
      [...]
      Pid: 0, comm: swapper Not tainted 2.6.32-504.el6.x86_64 #1 Bochs Bochs
      RIP: 0010:[<ffffffffa01e9c62>]  [<ffffffffa01e9c62>] sctp_process_init+0x492/0x990 [sctp]
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa01f2add>] ? sctp_bind_addr_copy+0x5d/0xe0 [sctp]
       [<ffffffffa01e1fcb>] sctp_sf_do_5_1B_init+0x21b/0x340 [sctp]
       [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
       [<ffffffffa01e5c09>] ? sctp_endpoint_lookup_assoc+0xc9/0xf0 [sctp]
       [<ffffffffa01e61f6>] sctp_endpoint_bh_rcv+0x116/0x230 [sctp]
       [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
       [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
       [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
       [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
      [...]
      
      A minimal way to address this is to check for NULL as we do on all
      other such occasions where we know sctp_get_af_specific() could
      possibly return with NULL.
      
      Fixes: d6de3097 ("[SCTP]: Add the handling of "Set Primary IP Address" parameter to INIT")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e40607cb
    • E
      net: introduce SO_INCOMING_CPU · 2c8c56e1
      Eric Dumazet 提交于
      Alternative to RPS/RFS is to use hardware support for multiple
      queues.
      
      Then split a set of million of sockets into worker threads, each
      one using epoll() to manage events on its own socket pool.
      
      Ideally, we want one thread per RX/TX queue/cpu, but we have no way to
      know after accept() or connect() on which queue/cpu a socket is managed.
      
      We normally use one cpu per RX queue (IRQ smp_affinity being properly
      set), so remembering on socket structure which cpu delivered last packet
      is enough to solve the problem.
      
      After accept(), connect(), or even file descriptor passing around
      processes, applications can use :
      
       int cpu;
       socklen_t len = sizeof(cpu);
      
       getsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, &len);
      
      And use this information to put the socket into the right silo
      for optimal performance, as all networking stack should run
      on the appropriate cpu, without need to send IPI (RPS/RFS).
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8c56e1
  19. 06 11月, 2014 1 次提交
    • D
      net: Add and use skb_copy_datagram_msg() helper. · 51f3d02b
      David S. Miller 提交于
      This encapsulates all of the skb_copy_datagram_iovec() callers
      with call argument signature "skb, offset, msghdr->msg_iov, length".
      
      When we move to iov_iters in the networking, the iov_iter object will
      sit in the msghdr.
      
      Having a helper like this means there will be less places to touch
      during that transformation.
      
      Based upon descriptions and patch from Al Viro.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51f3d02b
  20. 31 10月, 2014 2 次提交
  21. 15 10月, 2014 3 次提交
    • D
      net: sctp: fix remote memory pressure from excessive queueing · 26b87c78
      Daniel Borkmann 提交于
      This scenario is not limited to ASCONF, just taken as one
      example triggering the issue. When receiving ASCONF probes
      in the form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---- ASCONF_a; [ASCONF_b; ...; ASCONF_n;] JUNK ------>
        [...]
        ---- ASCONF_m; [ASCONF_o; ...; ASCONF_z;] JUNK ------>
      
      ... where ASCONF_a, ASCONF_b, ..., ASCONF_z are good-formed
      ASCONFs and have increasing serial numbers, we process such
      ASCONF chunk(s) marked with !end_of_packet and !singleton,
      since we have not yet reached the SCTP packet end. SCTP does
      only do verification on a chunk by chunk basis, as an SCTP
      packet is nothing more than just a container of a stream of
      chunks which it eats up one by one.
      
      We could run into the case that we receive a packet with a
      malformed tail, above marked as trailing JUNK. All previous
      chunks are here goodformed, so the stack will eat up all
      previous chunks up to this point. In case JUNK does not fit
      into a chunk header and there are no more other chunks in
      the input queue, or in case JUNK contains a garbage chunk
      header, but the encoded chunk length would exceed the skb
      tail, or we came here from an entirely different scenario
      and the chunk has pdiscard=1 mark (without having had a flush
      point), it will happen, that we will excessively queue up
      the association's output queue (a correct final chunk may
      then turn it into a response flood when flushing the
      queue ;)): I ran a simple script with incremental ASCONF
      serial numbers and could see the server side consuming
      excessive amount of RAM [before/after: up to 2GB and more].
      
      The issue at heart is that the chunk train basically ends
      with !end_of_packet and !singleton markers and since commit
      2e3216cd ("sctp: Follow security requirement of responding
      with 1 packet") therefore preventing an output queue flush
      point in sctp_do_sm() -> sctp_cmd_interpreter() on the input
      chunk (chunk = event_arg) even though local_cork is set,
      but its precedence has changed since then. In the normal
      case, the last chunk with end_of_packet=1 would trigger the
      queue flush to accommodate possible outgoing bundling.
      
      In the input queue, sctp_inq_pop() seems to do the right thing
      in terms of discarding invalid chunks. So, above JUNK will
      not enter the state machine and instead be released and exit
      the sctp_assoc_bh_rcv() chunk processing loop. It's simply
      the flush point being missing at loop exit. Adding a try-flush
      approach on the output queue might not work as the underlying
      infrastructure might be long gone at this point due to the
      side-effect interpreter run.
      
      One possibility, albeit a bit of a kludge, would be to defer
      invalid chunk freeing into the state machine in order to
      possibly trigger packet discards and thus indirectly a queue
      flush on error. It would surely be better to discard chunks
      as in the current, perhaps better controlled environment, but
      going back and forth, it's simply architecturally not possible.
      I tried various trailing JUNK attack cases and it seems to
      look good now.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26b87c78
    • D
      net: sctp: fix panic on duplicate ASCONF chunks · b69040d8
      Daniel Borkmann 提交于
      When receiving a e.g. semi-good formed connection scan in the
      form of ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ---------------- ASCONF_a; ASCONF_b ----------------->
      
      ... where ASCONF_a equals ASCONF_b chunk (at least both serials
      need to be equal), we panic an SCTP server!
      
      The problem is that good-formed ASCONF chunks that we reply with
      ASCONF_ACK chunks are cached per serial. Thus, when we receive a
      same ASCONF chunk twice (e.g. through a lost ASCONF_ACK), we do
      not need to process them again on the server side (that was the
      idea, also proposed in the RFC). Instead, we know it was cached
      and we just resend the cached chunk instead. So far, so good.
      
      Where things get nasty is in SCTP's side effect interpreter, that
      is, sctp_cmd_interpreter():
      
      While incoming ASCONF_a (chunk = event_arg) is being marked
      !end_of_packet and !singleton, and we have an association context,
      we do not flush the outqueue the first time after processing the
      ASCONF_ACK singleton chunk via SCTP_CMD_REPLY. Instead, we keep it
      queued up, although we set local_cork to 1. Commit 2e3216cd
      changed the precedence, so that as long as we get bundled, incoming
      chunks we try possible bundling on outgoing queue as well. Before
      this commit, we would just flush the output queue.
      
      Now, while ASCONF_a's ASCONF_ACK sits in the corked outq, we
      continue to process the same ASCONF_b chunk from the packet. As
      we have cached the previous ASCONF_ACK, we find it, grab it and
      do another SCTP_CMD_REPLY command on it. So, effectively, we rip
      the chunk->list pointers and requeue the same ASCONF_ACK chunk
      another time. Since we process ASCONF_b, it's correctly marked
      with end_of_packet and we enforce an uncork, and thus flush, thus
      crashing the kernel.
      
      Fix it by testing if the ASCONF_ACK is currently pending and if
      that is the case, do not requeue it. When flushing the output
      queue we may relink the chunk for preparing an outgoing packet,
      but eventually unlink it when it's copied into the skb right
      before transmission.
      
      Joint work with Vlad Yasevich.
      
      Fixes: 2e3216cd ("sctp: Follow security requirement of responding with 1 packet")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b69040d8
    • D
      net: sctp: fix skb_over_panic when receiving malformed ASCONF chunks · 9de7922b
      Daniel Borkmann 提交于
      Commit 6f4c618d ("SCTP : Add paramters validity check for
      ASCONF chunk") added basic verification of ASCONF chunks, however,
      it is still possible to remotely crash a server by sending a
      special crafted ASCONF chunk, even up to pre 2.6.12 kernels:
      
      skb_over_panic: text:ffffffffa01ea1c3 len:31056 put:30768
       head:ffff88011bd81800 data:ffff88011bd81800 tail:0x7950
       end:0x440 dev:<NULL>
       ------------[ cut here ]------------
      kernel BUG at net/core/skbuff.c:129!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffff8144fb1c>] skb_put+0x5c/0x70
       [<ffffffffa01ea1c3>] sctp_addto_chunk+0x63/0xd0 [sctp]
       [<ffffffffa01eadaf>] sctp_process_asconf+0x1af/0x540 [sctp]
       [<ffffffff8152d025>] ? _read_unlock_bh+0x15/0x20
       [<ffffffffa01e0038>] sctp_sf_do_asconf+0x168/0x240 [sctp]
       [<ffffffffa01e3751>] sctp_do_sm+0x71/0x1210 [sctp]
       [<ffffffff8147645d>] ? fib_rules_lookup+0xad/0xf0
       [<ffffffffa01e6b22>] ? sctp_cmp_addr_exact+0x32/0x40 [sctp]
       [<ffffffffa01e8393>] sctp_assoc_bh_rcv+0xd3/0x180 [sctp]
       [<ffffffffa01ee986>] sctp_inq_push+0x56/0x80 [sctp]
       [<ffffffffa01fcc42>] sctp_rcv+0x982/0xa10 [sctp]
       [<ffffffffa01d5123>] ? ipt_local_in_hook+0x23/0x28 [iptable_filter]
       [<ffffffff8148bdc9>] ? nf_iterate+0x69/0xb0
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff8148bf86>] ? nf_hook_slow+0x76/0x120
       [<ffffffff81496d10>] ? ip_local_deliver_finish+0x0/0x2d0
       [<ffffffff81496ded>] ip_local_deliver_finish+0xdd/0x2d0
       [<ffffffff81497078>] ip_local_deliver+0x98/0xa0
       [<ffffffff8149653d>] ip_rcv_finish+0x12d/0x440
       [<ffffffff81496ac5>] ip_rcv+0x275/0x350
       [<ffffffff8145c88b>] __netif_receive_skb+0x4ab/0x750
       [<ffffffff81460588>] netif_receive_skb+0x58/0x60
      
      This can be triggered e.g., through a simple scripted nmap
      connection scan injecting the chunk after the handshake, for
      example, ...
      
        -------------- INIT[ASCONF; ASCONF_ACK] ------------->
        <----------- INIT-ACK[ASCONF; ASCONF_ACK] ------------
        -------------------- COOKIE-ECHO -------------------->
        <-------------------- COOKIE-ACK ---------------------
        ------------------ ASCONF; UNKNOWN ------------------>
      
      ... where ASCONF chunk of length 280 contains 2 parameters ...
      
        1) Add IP address parameter (param length: 16)
        2) Add/del IP address parameter (param length: 255)
      
      ... followed by an UNKNOWN chunk of e.g. 4 bytes. Here, the
      Address Parameter in the ASCONF chunk is even missing, too.
      This is just an example and similarly-crafted ASCONF chunks
      could be used just as well.
      
      The ASCONF chunk passes through sctp_verify_asconf() as all
      parameters passed sanity checks, and after walking, we ended
      up successfully at the chunk end boundary, and thus may invoke
      sctp_process_asconf(). Parameter walking is done with
      WORD_ROUND() to take padding into account.
      
      In sctp_process_asconf()'s TLV processing, we may fail in
      sctp_process_asconf_param() e.g., due to removal of the IP
      address that is also the source address of the packet containing
      the ASCONF chunk, and thus we need to add all TLVs after the
      failure to our ASCONF response to remote via helper function
      sctp_add_asconf_response(), which basically invokes a
      sctp_addto_chunk() adding the error parameters to the given
      skb.
      
      When walking to the next parameter this time, we proceed
      with ...
      
        length = ntohs(asconf_param->param_hdr.length);
        asconf_param = (void *)asconf_param + length;
      
      ... instead of the WORD_ROUND()'ed length, thus resulting here
      in an off-by-one that leads to reading the follow-up garbage
      parameter length of 12336, and thus throwing an skb_over_panic
      for the reply when trying to sctp_addto_chunk() next time,
      which implicitly calls the skb_put() with that length.
      
      Fix it by using sctp_walk_params() [ which is also used in
      INIT parameter processing ] macro in the verification *and*
      in ASCONF processing: it will make sure we don't spill over,
      that we walk parameters WORD_ROUND()'ed. Moreover, we're being
      more defensive and guard against unknown parameter types and
      missized addresses.
      
      Joint work with Vlad Yasevich.
      
      Fixes: b896b82be4ae ("[SCTP] ADDIP: Support for processing incoming ASCONF_ACK chunks.")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9de7922b
  22. 06 10月, 2014 1 次提交
    • V
      sctp: handle association restarts when the socket is closed. · bdf6fa52
      Vlad Yasevich 提交于
      Currently association restarts do not take into consideration the
      state of the socket.  When a restart happens, the current assocation
      simply transitions into established state.  This creates a condition
      where a remote system, through a the restart procedure, may create a
      local association that is no way reachable by user.  The conditions
      to trigger this are as follows:
        1) Remote does not acknoledge some data causing data to remain
           outstanding.
        2) Local application calls close() on the socket.  Since data
           is still outstanding, the association is placed in SHUTDOWN_PENDING
           state.  However, the socket is closed.
        3) The remote tries to create a new association, triggering a restart
           on the local system.  The association moves from SHUTDOWN_PENDING
           to ESTABLISHED.  At this point, it is no longer reachable by
           any socket on the local system.
      
      This patch addresses the above situation by moving the newly ESTABLISHED
      association into SHUTDOWN-SENT state and bundling a SHUTDOWN after
      the COOKIE-ACK chunk.  This way, the restarted associate immidiately
      enters the shutdown procedure and forces the termination of the
      unreachable association.
      Reported-by: NDavid Laight <David.Laight@aculab.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bdf6fa52
  23. 10 9月, 2014 1 次提交
  24. 08 9月, 2014 1 次提交
    • T
      percpu_counter: add @gfp to percpu_counter_init() · 908c7f19
      Tejun Heo 提交于
      Percpu allocator now supports allocation mask.  Add @gfp to
      percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
      with percpu_counters too.
      
      We could have left percpu_counter_init() alone and added
      percpu_counter_init_gfp(); however, the number of users isn't that
      high and introducing _gfp variants to all percpu data structures would
      be quite ugly, so let's just do the conversion.  This is the one with
      the most users.  Other percpu data structures are a lot easier to
      convert.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Cc: x86@kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      908c7f19
  25. 30 8月, 2014 2 次提交
    • T
      sctp: Change sctp to implement csum_levels · 202863fe
      Tom Herbert 提交于
      CHECKSUM_UNNECESSARY may be applied to the SCTP CRC so we need to
      appropriate account for this by decrementing csum_level. This is
      done by calling __skb_dec_checksum_unnecessary.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      202863fe
    • D
      net: sctp: fix ABI mismatch through sctp_assoc_to_state helper · 38ab1fa9
      Daniel Borkmann 提交于
      Since SCTP day 1, that is, 19b55a2af145 ("Initial commit") from lksctp
      tree, the official <netinet/sctp.h> header carries a copy of enum
      sctp_sstat_state that looks like (compared to the current in-kernel
      enumeration):
      
        User definition:                     Kernel definition:
      
        enum sctp_sstat_state {              typedef enum {
          SCTP_EMPTY             = 0,          <removed>
          SCTP_CLOSED            = 1,          SCTP_STATE_CLOSED            = 0,
          SCTP_COOKIE_WAIT       = 2,          SCTP_STATE_COOKIE_WAIT       = 1,
          SCTP_COOKIE_ECHOED     = 3,          SCTP_STATE_COOKIE_ECHOED     = 2,
          SCTP_ESTABLISHED       = 4,          SCTP_STATE_ESTABLISHED       = 3,
          SCTP_SHUTDOWN_PENDING  = 5,          SCTP_STATE_SHUTDOWN_PENDING  = 4,
          SCTP_SHUTDOWN_SENT     = 6,          SCTP_STATE_SHUTDOWN_SENT     = 5,
          SCTP_SHUTDOWN_RECEIVED = 7,          SCTP_STATE_SHUTDOWN_RECEIVED = 6,
          SCTP_SHUTDOWN_ACK_SENT = 8,          SCTP_STATE_SHUTDOWN_ACK_SENT = 7,
        };                                   } sctp_state_t;
      
      This header was later on also placed into the uapi, so that user space
      programs can compile without having <netinet/sctp.h>, but the shipped
      with <linux/sctp.h> instead.
      
      While RFC6458 under 8.2.1.Association Status (SCTP_STATUS) says that
      sstat_state can range from SCTP_CLOSED to SCTP_SHUTDOWN_ACK_SENT, we
      nevertheless have a what it appears to be dummy SCTP_EMPTY state from
      the very early days.
      
      While it seems to do just nothing, commit 0b8f9e25 ("sctp: remove
      completely unsed EMPTY state") did the right thing and removed this dead
      code. That however, causes an off-by-one when the user asks the SCTP
      stack via SCTP_STATUS API and checks for the current socket state thus
      yielding possibly undefined behaviour in applications as they expect
      the kernel to tell the right thing.
      
      The enumeration had to be changed however as based on the current socket
      state, we access a function pointer lookup-table through this. Therefore,
      I think the best way to deal with this is just to add a helper function
      sctp_assoc_to_state() to encapsulate the off-by-one quirk.
      Reported-by: NTristan Su <sooqing@gmail.com>
      Fixes: 0b8f9e25 ("sctp: remove completely unsed EMPTY state")
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38ab1fa9
  26. 23 8月, 2014 2 次提交
    • D
      net: sctp: fix suboptimal edge-case on non-active active/retrans path selection · aa4a83ee
      Daniel Borkmann 提交于
      In SCTP, selection of active (T.ACT) and retransmission (T.RET)
      transports is being done whenever transport control operations
      (UP, DOWN, PF, ...) are engaged through sctp_assoc_control_transport().
      
      Commits 4c47af4d ("net: sctp: rework multihoming retransmission
      path selection to rfc4960") and a7288c4d ("net: sctp: improve
      sctp_select_active_and_retran_path selection") have both improved
      it towards a more fine-grained and optimal path selection.
      
      Currently, the selection algorithm for T.ACT and T.RET is as follows:
      
      1) Elect the two most recently used ACTIVE transports T1, T2 for
         T.ACT, T.RET, where T.ACT<-T1 and T1 is most recently used
      2) In case primary path T.PRI not in {T1, T2} but ACTIVE, set
         T.ACT<-T.PRI and T.RET<-T1
      3) If only T1 is ACTIVE from the set, set T.ACT<-T1 and T.RET<-T1
      4) If none is ACTIVE, set T.ACT<-best(T.PRI, T.RET, T3) where
         T3 is the most recently used (if avail) in PF, set T.RET<-T.PRI
      
      Prior to above commits, 4) was simply a camp on T.ACT<-T.PRI and
      T.RET<-T.PRI, ignoring possible paths in PF. Camping on T.PRI is
      still slightly suboptimal as it can lead to the following scenario:
      
      Setup:
              <A>                                <B>
          T1: p1p1 (10.0.10.10) <==>  .'`)  <==> p1p1 (10.0.10.12)  <= T.PRI
          T2: p1p2 (10.0.10.20) <==> (_ . ) <==> p1p2 (10.0.10.22)
      
          net.sctp.rto_min = 1000
          net.sctp.path_max_retrans = 2
          net.sctp.pf_retrans = 0
          net.sctp.hb_interval = 1000
      
      T.PRI is permanently down, T2 is put briefly into PF state (e.g. due to
      link flapping). Here, the first time transmission is sent over PF path
      T2 as it's the only non-INACTIVE path, but the retransmitted data-chunks
      are sent over the INACTIVE path T1 (T.PRI), which is not good.
      
      After the patch, it's choosing better transports in both cases by
      modifying step 4):
      
      4) If none is ACTIVE, set T.ACT_new<-best(T.ACT_old, T3) where T3 is
         the most recently used (if avail) in PF, set T.RET<-T.ACT_new
      
      This will still select a best possible path in PF if available (which
      can also include T.PRI/T.RET), and set both T.ACT/T.RET to it.
      
      In case sctp_assoc_control_transport() *just* put T.ACT_old into INACTIVE
      as it transitioned from ACTIVE->PF->INACTIVE and stays in INACTIVE just
      for a very short while before going back ACTIVE, it will guarantee that
      this path will be reselected for T.ACT/T.RET since T3 (PF) is not
      available.
      
      Previously, this was not possible, as we would only select between T.PRI
      and T.RET, and a possible T3 would be NULL due to the fact that we have
      just transitioned T3 in sctp_assoc_control_transport() from PF->INACTIVE
      and would select a suboptimal path when T.PRI/T.RET have worse properties.
      
      In the case that T.ACT_old permanently went to INACTIVE during this
      transition and there's no PF path available, plus T.PRI and T.RET are
      INACTIVE as well, we would now camp on T.ACT_old, but if everything is
      being INACTIVE there's really not much we can do except hoping for a
      successful HB to bring one of the transports back up again and, thus
      cause a new selection through sctp_assoc_control_transport().
      
      Now both tests work fine:
      
      Case 1:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(PF)
      
       3. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       5. T1 S(PF) T.ACT, T.RET
          T2 S(INACTIVE)
      
      [ 5.1 T1 S(INACTIVE) T.ACT, T.RET
            T2 S(INACTIVE) ]
      
       6. T1 S(ACTIVE) T.ACT, T.RET
          T2 S(INACTIVE)
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
      Case 2:
      
       1. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      
       2. T1 S(PF)
          T2 S(ACTIVE) T.ACT, T.RET
      
       3. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       5. T1 S(INACTIVE)
          T2 S(PF) T.ACT, T.RET
      
      [ 5.1 T1 S(INACTIVE)
            T2 S(INACTIVE) T.ACT, T.RET ]
      
       6. T1 S(INACTIVE)
          T2 S(ACTIVE) T.ACT, T.RET
      
       7. T1 S(ACTIVE) T.ACT
          T2 S(ACTIVE) T.RET
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa4a83ee
    • D
      net: sctp: spare unnecessary comparison in sctp_trans_elect_best · ea4f19c1
      Daniel Borkmann 提交于
      When both transports are the same, we don't have to go down that
      road only to realize that we will return the very same transport.
      We are guaranteed that curr is always non-NULL. Therefore, just
      short-circuit this special case.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea4f19c1
  27. 22 8月, 2014 1 次提交
  28. 06 8月, 2014 1 次提交
    • E
      sctp: fix possible seqlock seadlock in sctp_packet_transmit() · 757efd32
      Eric Dumazet 提交于
      Dave reported following splat, caused by improper use of
      IP_INC_STATS_BH() in process context.
      
      BUG: using __this_cpu_add() in preemptible [00000000] code: trinity-c117/14551
      caller is __this_cpu_preempt_check+0x13/0x20
      CPU: 3 PID: 14551 Comm: trinity-c117 Not tainted 3.16.0+ #33
       ffffffff9ec898f0 0000000047ea7e23 ffff88022d32f7f0 ffffffff9e7ee207
       0000000000000003 ffff88022d32f818 ffffffff9e397eaa ffff88023ee70b40
       ffff88022d32f970 ffff8801c026d580 ffff88022d32f828 ffffffff9e397ee3
      Call Trace:
       [<ffffffff9e7ee207>] dump_stack+0x4e/0x7a
       [<ffffffff9e397eaa>] check_preemption_disabled+0xfa/0x100
       [<ffffffff9e397ee3>] __this_cpu_preempt_check+0x13/0x20
       [<ffffffffc0839872>] sctp_packet_transmit+0x692/0x710 [sctp]
       [<ffffffffc082a7f2>] sctp_outq_flush+0x2a2/0xc30 [sctp]
       [<ffffffff9e0d985c>] ? mark_held_locks+0x7c/0xb0
       [<ffffffff9e7f8c6d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
       [<ffffffffc082b99a>] sctp_outq_uncork+0x1a/0x20 [sctp]
       [<ffffffffc081e112>] sctp_cmd_interpreter.isra.23+0x1142/0x13f0 [sctp]
       [<ffffffffc081c86b>] sctp_do_sm+0xdb/0x330 [sctp]
       [<ffffffff9e0b8f1b>] ? preempt_count_sub+0xab/0x100
       [<ffffffffc083b350>] ? sctp_cname+0x70/0x70 [sctp]
       [<ffffffffc08389ca>] sctp_primitive_ASSOCIATE+0x3a/0x50 [sctp]
       [<ffffffffc083358f>] sctp_sendmsg+0x88f/0xe30 [sctp]
       [<ffffffff9e0d673a>] ? lock_release_holdtime.part.28+0x9a/0x160
       [<ffffffff9e0d62ce>] ? put_lock_stats.isra.27+0xe/0x30
       [<ffffffff9e73b624>] inet_sendmsg+0x104/0x220
       [<ffffffff9e73b525>] ? inet_sendmsg+0x5/0x220
       [<ffffffff9e68ac4e>] sock_sendmsg+0x9e/0xe0
       [<ffffffff9e1c0c09>] ? might_fault+0xb9/0xc0
       [<ffffffff9e1c0bae>] ? might_fault+0x5e/0xc0
       [<ffffffff9e68b234>] SYSC_sendto+0x124/0x1c0
       [<ffffffff9e0136b0>] ? syscall_trace_enter+0x250/0x330
       [<ffffffff9e68c3ce>] SyS_sendto+0xe/0x10
       [<ffffffff9e7f9be4>] tracesys+0xdd/0xe2
      
      This is a followup of commits f1d8cba6 ("inet: fix possible
      seqlock deadlocks") and 7f88c6b2 ("ipv6: fix possible seqlock
      deadlock in ip6_finish_output2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      757efd32
  29. 01 8月, 2014 2 次提交
    • D
      net: fix the counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS · 7304fe46
      Duan Jiong 提交于
      When dealing with ICMPv[46] Error Message, function icmp_socket_deliver()
      and icmpv6_notify() do some valid checks on packet's length, but then some
      protocols check packet's length redaudantly. So remove those duplicated
      statements, and increase counter ICMP_MIB_INERRORS/ICMP6_MIB_INERRORS in
      function icmp_socket_deliver() and icmpv6_notify() respectively.
      
      In addition, add missed counter in udp6/udplite6 when socket is NULL.
      Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7304fe46
    • J
      sctp: Fixup v4mapped behaviour to comply with Sock API · 299ee123
      Jason Gunthorpe 提交于
      The SCTP socket extensions API document describes the v4mapping option as
      follows:
      
      8.1.15.  Set/Clear IPv4 Mapped Addresses (SCTP_I_WANT_MAPPED_V4_ADDR)
      
         This socket option is a Boolean flag which turns on or off the
         mapping of IPv4 addresses.  If this option is turned on, then IPv4
         addresses will be mapped to V6 representation.  If this option is
         turned off, then no mapping will be done of V4 addresses and a user
         will receive both PF_INET6 and PF_INET type addresses on the socket.
         See [RFC3542] for more details on mapped V6 addresses.
      
      This description isn't really in line with what the code does though.
      
      Introduce addr_to_user (renamed addr_v4map), which should be called
      before any sockaddr is passed back to user space. The new function
      places the sockaddr into the correct format depending on the
      SCTP_I_WANT_MAPPED_V4_ADDR option.
      
      Audit all places that touched v4mapped and either sanely construct
      a v4 or v6 address then call addr_to_user, or drop the
      unnecessary v4mapped check entirely.
      
      Audit all places that call addr_to_user and verify they are on a sycall
      return path.
      
      Add a custom getname that formats the address properly.
      
      Several bugs are addressed:
       - SCTP_I_WANT_MAPPED_V4_ADDR=0 often returned garbage for
         addresses to user space
       - The addr_len returned from recvmsg was not correct when
         returning AF_INET on a v6 socket
       - flowlabel and scope_id were not zerod when promoting
         a v4 to v6
       - Some syscalls like bind and connect behaved differently
         depending on v4mapped
      
      Tested bind, getpeername, getsockname, connect, and recvmsg for proper
      behaviour in v4mapped = 1 and 0 cases.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      299ee123