1. 04 12月, 2017 1 次提交
    • E
      tcp: add tcp_v4_fill_cb()/tcp_v4_restore_cb() · eeea10b8
      Eric Dumazet 提交于
      James Morris reported kernel stack corruption bug [1] while
      running the SELinux testsuite, and bisected to a recent
      commit bffa72cf ("net: sk_buff rbnode reorg")
      
      We believe this commit is fine, but exposes an older bug.
      
      SELinux code runs from tcp_filter() and might send an ICMP,
      expecting IP options to be found in skb->cb[] using regular IPCB placement.
      
      We need to defer TCP mangling of skb->cb[] after tcp_filter() calls.
      
      This patch adds tcp_v4_fill_cb()/tcp_v4_restore_cb() in a very
      similar way we added them for IPv6.
      
      [1]
      [  339.806024] SELinux: failure in selinux_parse_skb(), unable to parse packet
      [  339.822505] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff81745af5
      [  339.822505]
      [  339.852250] CPU: 4 PID: 3642 Comm: client Not tainted 4.15.0-rc1-test #15
      [  339.868498] Hardware name: LENOVO 10FGS0VA1L/30BC, BIOS FWKT68A   01/19/2017
      [  339.885060] Call Trace:
      [  339.896875]  <IRQ>
      [  339.908103]  dump_stack+0x63/0x87
      [  339.920645]  panic+0xe8/0x248
      [  339.932668]  ? ip_push_pending_frames+0x33/0x40
      [  339.946328]  ? icmp_send+0x525/0x530
      [  339.958861]  ? kfree_skbmem+0x60/0x70
      [  339.971431]  __stack_chk_fail+0x1b/0x20
      [  339.984049]  icmp_send+0x525/0x530
      [  339.996205]  ? netlbl_skbuff_err+0x36/0x40
      [  340.008997]  ? selinux_netlbl_err+0x11/0x20
      [  340.021816]  ? selinux_socket_sock_rcv_skb+0x211/0x230
      [  340.035529]  ? security_sock_rcv_skb+0x3b/0x50
      [  340.048471]  ? sk_filter_trim_cap+0x44/0x1c0
      [  340.061246]  ? tcp_v4_inbound_md5_hash+0x69/0x1b0
      [  340.074562]  ? tcp_filter+0x2c/0x40
      [  340.086400]  ? tcp_v4_rcv+0x820/0xa20
      [  340.098329]  ? ip_local_deliver_finish+0x71/0x1a0
      [  340.111279]  ? ip_local_deliver+0x6f/0xe0
      [  340.123535]  ? ip_rcv_finish+0x3a0/0x3a0
      [  340.135523]  ? ip_rcv_finish+0xdb/0x3a0
      [  340.147442]  ? ip_rcv+0x27c/0x3c0
      [  340.158668]  ? inet_del_offload+0x40/0x40
      [  340.170580]  ? __netif_receive_skb_core+0x4ac/0x900
      [  340.183285]  ? rcu_accelerate_cbs+0x5b/0x80
      [  340.195282]  ? __netif_receive_skb+0x18/0x60
      [  340.207288]  ? process_backlog+0x95/0x140
      [  340.218948]  ? net_rx_action+0x26c/0x3b0
      [  340.230416]  ? __do_softirq+0xc9/0x26a
      [  340.241625]  ? do_softirq_own_stack+0x2a/0x40
      [  340.253368]  </IRQ>
      [  340.262673]  ? do_softirq+0x50/0x60
      [  340.273450]  ? __local_bh_enable_ip+0x57/0x60
      [  340.285045]  ? ip_finish_output2+0x175/0x350
      [  340.296403]  ? ip_finish_output+0x127/0x1d0
      [  340.307665]  ? nf_hook_slow+0x3c/0xb0
      [  340.318230]  ? ip_output+0x72/0xe0
      [  340.328524]  ? ip_fragment.constprop.54+0x80/0x80
      [  340.340070]  ? ip_local_out+0x35/0x40
      [  340.350497]  ? ip_queue_xmit+0x15c/0x3f0
      [  340.361060]  ? __kmalloc_reserve.isra.40+0x31/0x90
      [  340.372484]  ? __skb_clone+0x2e/0x130
      [  340.382633]  ? tcp_transmit_skb+0x558/0xa10
      [  340.393262]  ? tcp_connect+0x938/0xad0
      [  340.403370]  ? ktime_get_with_offset+0x4c/0xb0
      [  340.414206]  ? tcp_v4_connect+0x457/0x4e0
      [  340.424471]  ? __inet_stream_connect+0xb3/0x300
      [  340.435195]  ? inet_stream_connect+0x3b/0x60
      [  340.445607]  ? SYSC_connect+0xd9/0x110
      [  340.455455]  ? __audit_syscall_entry+0xaf/0x100
      [  340.466112]  ? syscall_trace_enter+0x1d0/0x2b0
      [  340.476636]  ? __audit_syscall_exit+0x209/0x290
      [  340.487151]  ? SyS_connect+0xe/0x10
      [  340.496453]  ? do_syscall_64+0x67/0x1b0
      [  340.506078]  ? entry_SYSCALL64_slow_path+0x25/0x25
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NJames Morris <james.l.morris@oracle.com>
      Tested-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeea10b8
  2. 03 12月, 2017 1 次提交
  3. 02 12月, 2017 5 次提交
    • T
      tipc: call tipc_rcv() only if bearer is up in tipc_udp_recv() · c7799c06
      Tommi Rantala 提交于
      Remove the second tipc_rcv() call in tipc_udp_recv(). We have just
      checked that the bearer is not up, and calling tipc_rcv() with a bearer
      that is not up leads to a TIPC div-by-zero crash in
      tipc_node_calculate_timer(). The crash is rare in practice, but can
      happen like this:
      
        We're enabling a bearer, but it's not yet up and fully initialized.
        At the same time we receive a discovery packet, and in tipc_udp_recv()
        we end up calling tipc_rcv() with the not-yet-initialized bearer,
        causing later the div-by-zero crash in tipc_node_calculate_timer().
      
      Jon Maloy explains the impact of removing the second tipc_rcv() call:
        "link setup in the worst case will be delayed until the next arriving
         discovery messages, 1 sec later, and this is an acceptable delay."
      
      As the tipc_rcv() call is removed, just leave the function via the
      rcu_out label, so that we will kfree_skb().
      
      [   12.590450] Own node address <1.1.1>, network identity 1
      [   12.668088] divide error: 0000 [#1] SMP
      [   12.676952] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.14.2-dirty #1
      [   12.679225] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
      [   12.682095] task: ffff8c2a761edb80 task.stack: ffffa41cc0cac000
      [   12.684087] RIP: 0010:tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc]
      [   12.686486] RSP: 0018:ffff8c2a7fc838a0 EFLAGS: 00010246
      [   12.688451] RAX: 0000000000000000 RBX: ffff8c2a5b382600 RCX: 0000000000000000
      [   12.691197] RDX: 0000000000000000 RSI: ffff8c2a5b382600 RDI: ffff8c2a5b382600
      [   12.693945] RBP: ffff8c2a7fc838b0 R08: 0000000000000001 R09: 0000000000000001
      [   12.696632] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8c2a5d8949d8
      [   12.699491] R13: ffffffff95ede400 R14: 0000000000000000 R15: ffff8c2a5d894800
      [   12.702338] FS:  0000000000000000(0000) GS:ffff8c2a7fc80000(0000) knlGS:0000000000000000
      [   12.705099] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   12.706776] CR2: 0000000001bb9440 CR3: 00000000bd009001 CR4: 00000000003606e0
      [   12.708847] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   12.711016] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   12.712627] Call Trace:
      [   12.713390]  <IRQ>
      [   12.714011]  tipc_node_check_dest+0x2e8/0x350 [tipc]
      [   12.715286]  tipc_disc_rcv+0x14d/0x1d0 [tipc]
      [   12.716370]  tipc_rcv+0x8b0/0xd40 [tipc]
      [   12.717396]  ? minmax_running_min+0x2f/0x60
      [   12.718248]  ? dst_alloc+0x4c/0xa0
      [   12.718964]  ? tcp_ack+0xaf1/0x10b0
      [   12.719658]  ? tipc_udp_is_known_peer+0xa0/0xa0 [tipc]
      [   12.720634]  tipc_udp_recv+0x71/0x1d0 [tipc]
      [   12.721459]  ? dst_alloc+0x4c/0xa0
      [   12.722130]  udp_queue_rcv_skb+0x264/0x490
      [   12.722924]  __udp4_lib_rcv+0x21e/0x990
      [   12.723670]  ? ip_route_input_rcu+0x2dd/0xbf0
      [   12.724442]  ? tcp_v4_rcv+0x958/0xa40
      [   12.725039]  udp_rcv+0x1a/0x20
      [   12.725587]  ip_local_deliver_finish+0x97/0x1d0
      [   12.726323]  ip_local_deliver+0xaf/0xc0
      [   12.726959]  ? ip_route_input_noref+0x19/0x20
      [   12.727689]  ip_rcv_finish+0xdd/0x3b0
      [   12.728307]  ip_rcv+0x2ac/0x360
      [   12.728839]  __netif_receive_skb_core+0x6fb/0xa90
      [   12.729580]  ? udp4_gro_receive+0x1a7/0x2c0
      [   12.730274]  __netif_receive_skb+0x1d/0x60
      [   12.730953]  ? __netif_receive_skb+0x1d/0x60
      [   12.731637]  netif_receive_skb_internal+0x37/0xd0
      [   12.732371]  napi_gro_receive+0xc7/0xf0
      [   12.732920]  receive_buf+0x3c3/0xd40
      [   12.733441]  virtnet_poll+0xb1/0x250
      [   12.733944]  net_rx_action+0x23e/0x370
      [   12.734476]  __do_softirq+0xc5/0x2f8
      [   12.734922]  irq_exit+0xfa/0x100
      [   12.735315]  do_IRQ+0x4f/0xd0
      [   12.735680]  common_interrupt+0xa2/0xa2
      [   12.736126]  </IRQ>
      [   12.736416] RIP: 0010:native_safe_halt+0x6/0x10
      [   12.736925] RSP: 0018:ffffa41cc0cafe90 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff4d
      [   12.737756] RAX: 0000000000000000 RBX: ffff8c2a761edb80 RCX: 0000000000000000
      [   12.738504] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
      [   12.739258] RBP: ffffa41cc0cafe90 R08: 0000014b5b9795e5 R09: ffffa41cc12c7e88
      [   12.740118] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000002
      [   12.740964] R13: ffff8c2a761edb80 R14: 0000000000000000 R15: 0000000000000000
      [   12.741831]  default_idle+0x2a/0x100
      [   12.742323]  arch_cpu_idle+0xf/0x20
      [   12.742796]  default_idle_call+0x28/0x40
      [   12.743312]  do_idle+0x179/0x1f0
      [   12.743761]  cpu_startup_entry+0x1d/0x20
      [   12.744291]  start_secondary+0x112/0x120
      [   12.744816]  secondary_startup_64+0xa5/0xa5
      [   12.745367] Code: b9 f4 01 00 00 48 89 c2 48 c1 ea 02 48 3d d3 07 00
      00 48 0f 47 d1 49 8b 0c 24 48 39 d1 76 07 49 89 14 24 48 89 d1 31 d2 48
      89 df <48> f7 f1 89 c6 e8 81 6e ff ff 5b 41 5c 5d c3 66 90 66 2e 0f 1f
      [   12.747527] RIP: tipc_node_calculate_timer.isra.12+0x45/0x60 [tipc] RSP: ffff8c2a7fc838a0
      [   12.748555] ---[ end trace 1399ab83390650fd ]---
      [   12.749296] Kernel panic - not syncing: Fatal exception in interrupt
      [   12.750123] Kernel Offset: 0x13200000 from 0xffffffff82000000
      (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [   12.751215] Rebooting in 60 seconds..
      
      Fixes: c9b64d49 ("tipc: add replicast peer discovery")
      Signed-off-by: NTommi Rantala <tommi.t.rantala@nokia.com>
      Cc: Jon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7799c06
    • E
      tcp/dccp: block bh before arming time_wait timer · cfac7f83
      Eric Dumazet 提交于
      Maciej Żenczykowski reported some panics in tcp_twsk_destructor()
      that might be caused by the following bug.
      
      timewait timer is pinned to the cpu, because we want to transition
      timwewait refcount from 0 to 4 in one go, once everything has been
      initialized.
      
      At the time commit ed2e9239 ("tcp/dccp: fix timewait races in timer
      handling") was merged, TCP was always running from BH habdler.
      
      After commit 5413d1ba ("net: do not block BH while processing
      socket backlog") we definitely can run tcp_time_wait() from process
      context.
      
      We need to block BH in the critical section so that the pinned timer
      has still its purpose.
      
      This bug is more likely to happen under stress and when very small RTO
      are used in datacenter flows.
      
      Fixes: 5413d1ba ("net: do not block BH while processing socket backlog")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMaciej Żenczykowski <maze@google.com>
      Acked-by: NMaciej Żenczykowski <maze@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfac7f83
    • X
      sctp: do not abandon the other frags in unsent outq if one msg has outstanding frags · 779edd73
      Xin Long 提交于
      Now for the abandoned chunks in unsent outq, it would just free the chunks.
      Because no tsn is assigned to them yet, there's no need to send fwd tsn to
      peer, unlike for the abandoned chunks in sent outq.
      
      The problem is when parts of the msg have been sent and the other frags
      are still in unsent outq, if they are abandoned/dropped, the peer would
      never get this msg reassembled.
      
      So these frags in unsent outq can't be dropped if this msg already has
      outstanding frags.
      
      This patch does the check in sctp_chunk_abandoned and
      sctp_prsctp_prune_unsent.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      779edd73
    • X
      sctp: abandon the whole msg if one part of a fragmented message is abandoned · e5f61296
      Xin Long 提交于
      As rfc3758#section-3.1 demands:
      
         A3) When a TSN is "abandoned", if it is part of a fragmented message,
             all other TSN's within that fragmented message MUST be abandoned
             at the same time.
      
      Besides, if it couldn't handle this, the rest frags would never get
      assembled in peer side.
      
      This patch supports it by adding abandoned flag in sctp_datamsg, when
      one chunk is being abandoned, set chunk->msg->abandoned as well. Next
      time when checking for abandoned, go checking chunk->msg->abandoned
      first.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e5f61296
    • X
      sctp: only update outstanding_bytes for transmitted queue when doing prsctp_prune · d30fc512
      Xin Long 提交于
      Now outstanding_bytes is only increased when appending chunks into one
      packet and sending it at 1st time, while decreased when it is about to
      move into retransmit queue. It means outstanding_bytes value is already
      decreased for all chunks in retransmit queue.
      
      However sctp_prsctp_prune_sent is a common function to check the chunks
      in both transmitted and retransmit queue, it decrease outstanding_bytes
      when moving a chunk into abandoned queue from either of them.
      
      It could cause outstanding_bytes underflow, as it also decreases it's
      value for the chunks in retransmit queue.
      
      This patch fixes it by only updating outstanding_bytes for transmitted
      queue when pruning queues for prsctp prio policy, the same fix is also
      needed in sctp_check_transmitted.
      
      Fixes: 8dbdf1f5 ("sctp: implement prsctp PRIO policy")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d30fc512
  4. 30 11月, 2017 3 次提交
  5. 29 11月, 2017 12 次提交
    • G
      rxrpc: Fix variable overwrite · 282ef472
      Gustavo A. R. Silva 提交于
      Values assigned to both variable resend_at and ack_at are overwritten
      before they can be used.
      
      The correct fix here is to add 'now' to the previously computed value in
      resend_at and ack_at.
      
      Addresses-Coverity-ID: 1462262
      Addresses-Coverity-ID: 1462263
      Addresses-Coverity-ID: 1462264
      Fixes: beb8e5e4 ("rxrpc: Express protocol timeouts in terms of RTT")
      Link: https://marc.info/?i=17004.1511808959%40warthog.procyon.org.ukSigned-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      282ef472
    • D
      rxrpc: Fix ACK generation from the connection event processor · 5fc62f6a
      David Howells 提交于
      Repeat terminal ACKs and now terminal ACKs are now generated from the
      connection event processor rather from call handling as this allows us to
      discard client call structures as soon as possible and free up the channel
      for a follow on call.
      
      However, in ACKs so generated, the additional information trailer is
      malformed because the padding that's meant to be in the middle isn't
      included in what's transmitted.
      
      Fix it so that the 3 bytes of padding are included in the transmission.
      
      Further, the trailer is misaligned because of the padding, so assigment to
      the u16 and u32 fields inside it might cause problems on some arches, so
      fix this by breaking the padding and the trailer out of the packed struct.
      
      (This also deals with potential compiler weirdies where some of the nested
      structs are packed and some aren't).
      
      The symptoms can be seen in wireshark as terminal DUPLICATE or IDLE ACK
      packets in which the Max MTU, Interface MTU and rwind fields have weird
      values and the Max Packets field is apparently missing.
      Reported-by: NJeffrey Altman <jaltman@auristor.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5fc62f6a
    • D
      rxrpc: Clean up whitespace · 3d7682af
      David Howells 提交于
      Clean up some whitespace from rxrpc.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d7682af
    • J
      net: sched: cbq: create block for q->link.block · d51aae68
      Jiri Pirko 提交于
      q->link.block is not initialized, that leads to EINVAL when one tries to
      add filter there. So initialize it properly.
      
      This can be reproduced by:
      $ tc qdisc add dev eth0 root handle 1: cbq avpkt 1000 rate 1000Mbit bandwidth 1000Mbit
      $ tc filter add dev eth0 parent 1: protocol ip prio 100 u32 match ip protocol 0 0x00 flowid 1:1
      Reported-by: NJaroslav Aster <jaster@redhat.com>
      Reported-by: NIvan Vecera <ivecera@redhat.com>
      Fixes: 6529eaba ("net: sched: introduce tcf block infractructure")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d51aae68
    • J
      VSOCK: Don't set sk_state to TCP_CLOSE before testing it · 4a5def7f
      Jorgen Hansen 提交于
      A recent commit (3b4477d2) converted the sk_state to use
      TCP constants. In that change, vmci_transport_handle_detach
      was changed such that sk->sk_state was set to TCP_CLOSE before
      we test whether it is TCP_SYN_SENT. This change moves the
      sk_state change back to the original locations in that function.
      Signed-off-by: NJorgen Hansen <jhansen@vmware.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a5def7f
    • X
      sctp: use right member as the param of list_for_each_entry · a8dd3979
      Xin Long 提交于
      Commit d04adf1b ("sctp: reset owner sk for data chunks on out queues
      when migrating a sock") made a mistake that using 'list' as the param of
      list_for_each_entry to traverse the retransmit, sacked and abandoned
      queues, while chunks are using 'transmitted_list' to link into these
      queues.
      
      It could cause NULL dereference panic if there are chunks in any of these
      queues when peeling off one asoc.
      
      So use the chunk member 'transmitted_list' instead in this patch.
      
      Fixes: d04adf1b ("sctp: reset owner sk for data chunks on out queues when migrating a sock")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8dd3979
    • P
      sch_sfq: fix null pointer dereference at timer expiration · f85729d0
      Paolo Abeni 提交于
      While converting sch_sfq to use timer_setup(), the commit cdeabbb8
      ("net: sched: Convert timers to use timer_setup()") forgot to
      initialize the 'sch' field. As a result, the timer callback tries to
      dereference a NULL pointer, and the kernel does oops.
      
      Fix it initializing such field at qdisc creation time.
      
      Fixes: cdeabbb8 ("net: sched: Convert timers to use timer_setup()")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f85729d0
    • J
      cls_bpf: don't decrement net's refcount when offload fails · 25415cec
      Jakub Kicinski 提交于
      When cls_bpf offload was added it seemed like a good idea to
      call cls_bpf_delete_prog() instead of extending the error
      handling path, since the software state is fully initialized
      at that point.  This handling of errors without jumping to
      the end of the function is error prone, as proven by later
      commit missing that extra call to __cls_bpf_delete_prog().
      
      __cls_bpf_delete_prog() is now expected to be invoked with
      a reference on exts->net or the field zeroed out.  The call
      on the offload's error patch does not fullfil this requirement,
      leading to each error stealing a reference on net namespace.
      
      Create a function undoing what cls_bpf_set_parms() did and
      use it from __cls_bpf_delete_prog() and the error path.
      
      Fixes: aae2c35e ("cls_bpf: use tcf_exts_get_net() before call_rcu()")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25415cec
    • E
      net/packet: fix a race in packet_bind() and packet_notifier() · 15fe076e
      Eric Dumazet 提交于
      syzbot reported crashes [1] and provided a C repro easing bug hunting.
      
      When/if packet_do_bind() calls __unregister_prot_hook() and releases
      po->bind_lock, another thread can run packet_notifier() and process an
      NETDEV_UP event.
      
      This calls register_prot_hook() and hooks again the socket right before
      first thread is able to grab again po->bind_lock.
      
      Fixes this issue by temporarily setting po->num to 0, as suggested by
      David Miller.
      
      [1]
      dev_remove_pack: ffff8801bf16fa80 not found
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      device syz0 entered promiscuous mode
      CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cc57a500 task.stack: ffff8801cc588000
      RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
      RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
      RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
      RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
      device syz0 entered promiscuous mode
      RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
      R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
      FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
       tun_detach drivers/net/tun.c:670 [inline]
       tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
       __fput+0x333/0x7f0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ae0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       SYSC_exit_group kernel/exit.c:979 [inline]
       SyS_exit_group+0x1d/0x20 kernel/exit.c:977
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x44ad19
      
      Fixes: 30f7ea1c ("packet: race condition in packet_bind")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15fe076e
    • M
      packet: fix crash in fanout_demux_rollover() · 57f015f5
      Mike Maloney 提交于
      syzkaller found a race condition fanout_demux_rollover() while removing
      a packet socket from a fanout group.
      
      po->rollover is read and operated on during packet_rcv_fanout(), via
      fanout_demux_rollover(), but the pointer is currently cleared before the
      synchronization in packet_release().   It is safer to delay the cleanup
      until after synchronize_net() has been called, ensuring all calls to
      packet_rcv_fanout() for this socket have finished.
      
      To further simplify synchronization around the rollover structure, set
      po->rollover in fanout_add() only if there are no errors.  This removes
      the need for rcu in the struct and in the call to
      packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).
      
      Crashing stack trace:
       fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
       packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
       dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
       xmit_one net/core/dev.c:2975 [inline]
       dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
       __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
       neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
       NF_HOOK_COND include/linux/netfilter.h:239 [inline]
       ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
       dst_output include/net/dst.h:459 [inline]
       NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
       mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
       mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
       mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
       ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
       addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
       addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
       process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432
      
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Fixes: 509c7a1e ("packet: avoid panic in packet_getsockopt()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMike Maloney <maloney@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57f015f5
    • X
      sctp: remove extern from stream sched · 1ba896f6
      Xin Long 提交于
      Now each stream sched ops is defined in different .c file and
      added into the global ops in another .c file, it uses extern
      to make this work.
      
      However extern is not good coding style to get them in and
      even make C=2 reports errors for this.
      
      This patch adds sctp_sched_ops_xxx_init for each stream sched
      ops in their .c file, then get them into the global ops by
      calling them when initializing sctp module.
      
      Fixes: 637784ad ("sctp: introduce priority based stream scheduler")
      Fixes: ac1ed8b8 ("sctp: introduce round robin stream scheduler")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ba896f6
    • X
      sctp: force SCTP_ERROR_INV_STRM with __u32 when calling sctp_chunk_fail · 08f46070
      Xin Long 提交于
      This patch is to force SCTP_ERROR_INV_STRM with right type to
      fit in sctp_chunk_fail to avoid the sparse error.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      08f46070
  6. 28 11月, 2017 3 次提交
  7. 27 11月, 2017 10 次提交
  8. 26 11月, 2017 2 次提交
  9. 25 11月, 2017 2 次提交
  10. 24 11月, 2017 1 次提交
    • D
      rxrpc: Fix conn expiry timers · 3d18cbb7
      David Howells 提交于
      Fix the rxrpc connection expiry timers so that connections for closed
      AF_RXRPC sockets get deleted in a more timely fashion, freeing up the
      transport UDP port much more quickly.
      
       (1) Replace the delayed work items with work items plus timers so that
           timer_reduce() can be used to shorten them and so that the timer
           doesn't requeue the work item if the net namespace is dead.
      
       (2) Don't use queue_delayed_work() as that won't alter the timeout if the
           timer is already running.
      
       (3) Don't rearm the timers if the network namespace is dead.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      3d18cbb7