1. 10 3月, 2017 4 次提交
  2. 08 3月, 2017 10 次提交
    • W
      ipv6: reorder icmpv6_init() and ip6_mr_init() · 15e66807
      WANG Cong 提交于
      Andrey reported the following kernel crash:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      CPU: 0 PID: 14446 Comm: syz-executor6 Not tainted 4.10.0+ #82
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff88001f311700 task.stack: ffff88001f6e8000
      RIP: 0010:ip6mr_sk_done+0x15a/0x3d0 net/ipv6/ip6mr.c:1618
      RSP: 0018:ffff88001f6ef418 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 1ffff10003edde8c RCX: ffffc900043ee000
      RDX: 0000000000000004 RSI: ffffffff83e3b3f8 RDI: 0000000000000020
      RBP: ffff88001f6ef508 R08: fffffbfff0dcc5d8 R09: 0000000000000000
      R10: ffffffff86e62ec0 R11: 0000000000000000 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88001f6ef4e0 R15: ffff8800380a0040
      FS:  00007f7a52cec700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000061c500 CR3: 000000001f1ae000 CR4: 00000000000006f0
      DR0: 0000000020000000 DR1: 0000000020000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
      Call Trace:
       rawv6_close+0x4c/0x80 net/ipv6/raw.c:1217
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:432
       sock_release+0x8d/0x1e0 net/socket.c:597
       __sock_create+0x39d/0x880 net/socket.c:1226
       sock_create_kern+0x3f/0x50 net/socket.c:1243
       inet_ctl_sock_create+0xbb/0x280 net/ipv4/af_inet.c:1526
       icmpv6_sk_init+0x163/0x500 net/ipv6/icmp.c:954
       ops_init+0x10a/0x550 net/core/net_namespace.c:115
       setup_net+0x261/0x660 net/core/net_namespace.c:291
       copy_net_ns+0x27e/0x540 net/core/net_namespace.c:396
      9pnet_virtio: no channels available for device ./file1
       create_new_namespaces+0x437/0x9b0 kernel/nsproxy.c:106
       unshare_nsproxy_namespaces+0xae/0x1e0 kernel/nsproxy.c:205
       SYSC_unshare kernel/fork.c:2281 [inline]
       SyS_unshare+0x64e/0x1000 kernel/fork.c:2231
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      
      This is because net->ipv6.mr6_tables is not initialized at that point,
      ip6mr_rules_init() is not called yet, therefore on the error path when
      we iterator the list, we trigger this oops. Fix this by reordering
      ip6mr_rules_init() before icmpv6_sk_init().
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15e66807
    • E
      dccp: fix use-after-free in dccp_feat_activate_values · 62f8f4d9
      Eric Dumazet 提交于
      Dmitry reported crashes in DCCP stack [1]
      
      Problem here is that when I got rid of listener spinlock, I missed the
      fact that DCCP stores a complex state in struct dccp_request_sock,
      while TCP does not.
      
      Since multiple cpus could access it at the same time, we need to add
      protection.
      
      [1]
      BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0
      net/dccp/feat.c:1541 at addr ffff88003713be68
      Read of size 8 by task syz-executor2/8457
      CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:15 [inline]
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       kasan_object_err+0x1c/0x70 mm/kasan/report.c:162
       print_address_description mm/kasan/report.c:200 [inline]
       kasan_report_error mm/kasan/report.c:289 [inline]
       kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311
       kasan_report mm/kasan/report.c:332 [inline]
       __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332
       dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
       do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902
       </IRQ>
       do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328
       do_softirq kernel/softirq.c:176 [inline]
       __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181
       local_bh_enable include/linux/bottom_half.h:31 [inline]
       rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline]
       ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123
       ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162
       ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501
       inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179
       dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141
       dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280
       dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362
       dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:645
       SYSC_sendto+0x660/0x810 net/socket.c:1687
       SyS_sendto+0x40/0x50 net/socket.c:1655
       entry_SYSCALL_64_fastpath+0x1f/0xc2
      RIP: 0033:0x4458b9
      RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c
      RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9
      RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017
      RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020
      R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8
      R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700
      Object at ffff88003713be50, in cache kmalloc-64 size: 64
      Allocated:
      PID = 8446
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738
       kmalloc include/linux/slab.h:490 [inline]
       dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467
       dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487
       __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741
       dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949
       dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012
       dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423
       dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217
       dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377
       dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606
       dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632
       sk_backlog_rcv include/net/sock.h:893 [inline]
       __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479
       dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Freed:
      PID = 15
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514 [inline]
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1355 [inline]
       slab_free_freelist_hook mm/slub.c:1377 [inline]
       slab_free mm/slub.c:2954 [inline]
       kfree+0xe8/0x2b0 mm/slub.c:3874
       dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418
       dccp_feat_entry_destructor net/dccp/feat.c:416 [inline]
       dccp_feat_list_pop net/dccp/feat.c:541 [inline]
       dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543
       dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121
       dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457
       dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186
       dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711
       ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322
       dst_input include/net/dst.h:507 [inline]
       ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69
       NF_HOOK include/linux/netfilter.h:257 [inline]
       ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203
       __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190
       __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228
       process_backlog+0xe5/0x6c0 net/core/dev.c:4839
       napi_poll net/core/dev.c:5202 [inline]
       net_rx_action+0xe70/0x1900 net/core/dev.c:5267
       __do_softirq+0x2fb/0xb7d kernel/softirq.c:284
      Memory state around the buggy address:
       ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb
                                                                ^
      
      Fixes: 079096f1 ("tcp/dccp: install syn_recv requests into ehash table")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Tested-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62f8f4d9
    • A
      net/sched: act_skbmod: remove unneeded rcu_read_unlock in tcf_skbmod_dump · 6c4dc75c
      Alexey Khoroshilov 提交于
      Found by Linux Driver Verification project (linuxtesting.org).
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c4dc75c
    • S
      rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races · b21dd450
      Sowmini Varadhan 提交于
      Commit a93d01f5 ("RDS: TCP: avoid bad page reference in
      rds_tcp_listen_data_ready") added the function
      rds_tcp_listen_sock_def_readable()  to handle the case when a
      partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
      However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
      through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
      null and we would hit a panic  of the form
        BUG: unable to handle kernel NULL pointer dereference at   (null)
        IP:           (null)
         :
        ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
        tcp_data_queue+0x39d/0x5b0
        tcp_rcv_established+0x2e5/0x660
        tcp_v4_do_rcv+0x122/0x220
        tcp_v4_rcv+0x8b7/0x980
          :
      In the above case, it is not fatal to encounter a NULL value for
      ready- we should just drop the packet and let the flush of the
      acceptor thread finish gracefully.
      
      In general, the tear-down sequence for listen() and accept() socket
      that is ensured by this commit is:
           rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
           In rds_tcp_listen_stop():
               serialize with, and prevent, further callbacks using lock_sock()
               flush rds_wq
               flush acceptor workq
               sock_release(listen socket)
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b21dd450
    • S
      rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races · 16c09b1c
      Sowmini Varadhan 提交于
      Order of initialization in rds_tcp_init needs to be done so
      that resources are set up and destroyed in the correct synchronization
      sequence with both the data path, as well as netns create/destroy
      path. Specifically,
      
      - we must call register_pernet_subsys and get the rds_tcp_netid
        before calling register_netdevice_notifier, otherwise we risk
        the sequence
          1. register_netdevice_notifier sets up netdev notifier callback
          2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds
             the wrong rtn, resulting in a panic with string that is of the form:
      
        BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
        IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp]
               :
      
      - the rds_tcp_incoming_slab kmem_cache must be initialized before the
        datapath starts up. The latter can happen any time after the
        pernet_subsys registration of rds_tcp_net_ops, whose -> init
        function sets up the listen socket. If the rds_tcp_incoming_slab has
        not been set up at that time, a panic of the form below may be
        encountered
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
        IP: kmem_cache_alloc+0x90/0x1c0
           :
        rds_tcp_data_recv+0x1e7/0x370 [rds_tcp]
        tcp_read_sock+0x96/0x1c0
        rds_tcp_recv_path+0x65/0x80 [rds_tcp]
           :
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16c09b1c
    • S
      rds: tcp: Take explicit refcounts on struct net · 8edc3aff
      Sowmini Varadhan 提交于
      It is incorrect for the rds_connection to piggyback on the
      sock_net() refcount for the netns because this gives rise to
      a chicken-and-egg problem during rds_conn_destroy. Instead explicitly
      take a ref on the net, and hold the netns down till the connection
      tear-down is complete.
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8edc3aff
    • E
      net: fix socket refcounting in skb_complete_tx_timestamp() · 9ac25fc0
      Eric Dumazet 提交于
      TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
      By the time TX completion happens, sk_refcnt might be already 0.
      
      sock_hold()/sock_put() would then corrupt critical state, like
      sk_wmem_alloc and lead to leaks or use after free.
      
      Fixes: 62bccb8c ("net-timestamp: Make the clone operation stand-alone from phy timestamping")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ac25fc0
    • E
      net: fix socket refcounting in skb_complete_wifi_ack() · dd4f1072
      Eric Dumazet 提交于
      TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt
      By the time TX completion happens, sk_refcnt might be already 0.
      
      sock_hold()/sock_put() would then corrupt critical state, like
      sk_wmem_alloc.
      
      Fixes: bf7fa551 ("mac80211: Resolve sk_refcnt/sk_wmem_alloc issue in wifi ack path")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Soheil Hassas Yeganeh <soheil@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd4f1072
    • D
      rxrpc: Call state should be read with READ_ONCE() under some circumstances · 146d8fef
      David Howells 提交于
      The call state may be changed at any time by the data-ready routine in
      response to received packets, so if the call state is to be read and acted
      upon several times in a function, READ_ONCE() must be used unless the call
      state lock is held.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      146d8fef
    • E
      tcp: fix various issues for sockets morphing to listen state · 02b2faaf
      Eric Dumazet 提交于
      Dmitry Vyukov reported a divide by 0 triggered by syzkaller, exploiting
      tcp_disconnect() path that was never really considered and/or used
      before syzkaller ;)
      
      I was not able to reproduce the bug, but it seems issues here are the
      three possible actions that assumed they would never trigger on a
      listener.
      
      1) tcp_write_timer_handler
      2) tcp_delack_timer_handler
      3) MTU reduction
      
      Only IPv6 MTU reduction was properly testing TCP_CLOSE and TCP_LISTEN
       states from tcp_v6_mtu_reduced()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02b2faaf
  3. 04 3月, 2017 3 次提交
  4. 03 3月, 2017 10 次提交
    • P
      netfilter: nf_tables: don't call nfnetlink_set_err() if nfnetlink_send() fails · 25e94a99
      Pablo Neira Ayuso 提交于
      The underlying nlmsg_multicast() already sets sk->sk_err for us to
      notify socket overruns, so we should not do anything with this return
      value. So we just call nfnetlink_set_err() if:
      
      1) We fail to allocate the netlink message.
      
      or
      
      2) We don't have enough space in the netlink message to place attributes,
         which means that we likely need to allocate a larger message.
      
      Before this patch, the internal ESRCH netlink error code was propagated
      to userspace, which is quite misleading. Netlink semantics mandate that
      listeners just hit ENOBUFS if the socket buffer overruns.
      Reported-by: NAlexander Alemayhu <alexander@alemayhu.com>
      Tested-by: NAlexander Alemayhu <alexander@alemayhu.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      25e94a99
    • P
      netfilter: nft_set_rbtree: incorrect assumption on lower interval lookups · f9121355
      Pablo Neira Ayuso 提交于
      In case of adjacent ranges, we may indeed see either the high part of
      the range in first place or the low part of it. Remove this incorrect
      assumption, let's make sure we annotate the low part of the interval in
      case of we have adjacent interva intervals so we hit a matching in
      lookups.
      Reported-by: NSimon Hanisch <hanisch@wh2.tu-dresden.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f9121355
    • C
      netfilter: nf_conntrack_sip: fix wrong memory initialisation · da2f27e9
      Christophe Leroy 提交于
      In commit 82de0be6 ("netfilter: Add helper array
      register/unregister functions"),
      struct nf_conntrack_helper sip[MAX_PORTS][4] was changed to
      sip[MAX_PORTS * 4], so the memory init should have been changed to
      memset(&sip[4 * i], 0, 4 * sizeof(sip[i]));
      
      But as the sip[] table is allocated in the BSS, it is already set to 0
      
      Fixes: 82de0be6 ("netfilter: Add helper array register/unregister functions")
      Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      da2f27e9
    • I
      sched/headers: Move task_struct::signal and task_struct::sighand types and... · c3edc401
      Ingo Molnar 提交于
      sched/headers: Move task_struct::signal and task_struct::sighand types and accessors into <linux/sched/signal.h>
      
      task_struct::signal and task_struct::sighand are pointers, which would normally make it
      straightforward to not define those types in sched.h.
      
      That is not so, because the types are accompanied by a myriad of APIs (macros and inline
      functions) that dereference them.
      
      Split the types and the APIs out of sched.h and move them into a new header, <linux/sched/signal.h>.
      
      With this change sched.h does not know about 'struct signal' and 'struct sighand' anymore,
      trying to put accessors into sched.h as a test fails the following way:
      
        ./include/linux/sched.h: In function ‘test_signal_types’:
        ./include/linux/sched.h:2461:18: error: dereferencing pointer to incomplete type ‘struct signal_struct’
                          ^
      
      This reduces the size and complexity of sched.h significantly.
      
      Update all headers and .c code that relied on getting the signal handling
      functionality from <linux/sched.h> to include <linux/sched/signal.h>.
      
      The list of affected files in the preparatory patch was partly generated by
      grepping for the APIs, and partly by doing coverage build testing, both
      all[yes|mod|def|no]config builds on 64-bit and 32-bit x86, and an array of
      cross-architecture builds.
      
      Nevertheless some (trivial) build breakage is still expected related to rare
      Kconfig combinations and in-flight patches to various kernel code, but most
      of it should be handled by this patch.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c3edc401
    • W
      ipv6: ignore null_entry in inet6_rtm_getroute() too · 9d6acb3b
      WANG Cong 提交于
      Like commit 1f17e2f2 ("net: ipv6: ignore null_entry on route dumps"),
      we need to ignore null entry in inet6_rtm_getroute() too.
      
      Return -ENETUNREACH here to sync with IPv4 behavior, as suggested by David.
      
      Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d6acb3b
    • W
      tcp: fix potential double free issue for fastopen_req · 7db92362
      Wei Wang 提交于
      tp->fastopen_req could potentially be double freed if a malicious
      user does the following:
      1. Enable TCP_FASTOPEN_CONNECT sockopt and do a connect() on the socket.
      2. Call connect() with AF_UNSPEC to disconnect the socket.
      3. Make this socket a listening socket by calling listen().
      4. Accept incoming connections and generate child sockets. All child
         sockets will get a copy of the pointer of fastopen_req.
      5. Call close() on all sockets. fastopen_req will get freed multiple
         times.
      
      Fixes: 19f6d3f3 ("net/tcp-fastopen: Add new API support")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7db92362
    • A
      net: Introduce sk_clone_lock() error path routine · 94352d45
      Arnaldo Carvalho de Melo 提交于
      When handling problems in cloning a socket with the sk_clone_locked()
      function we need to perform several steps that were open coded in it and
      its callers, so introduce a routine to avoid this duplication:
      sk_free_unlock_clone().
      
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/n/net-ui6laqkotycunhtmqryl9bfx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94352d45
    • A
      dccp: Unlock sock before calling sk_free() · d5afb6f9
      Arnaldo Carvalho de Melo 提交于
      The code where sk_clone() came from created a new socket and locked it,
      but then, on the error path didn't unlock it.
      
      This problem stayed there for a long while, till b0691c8e ("net:
      Unlock sock before calling sk_free()") fixed it, but unfortunately the
      callers of sk_clone() (now sk_clone_locked()) were not audited and the
      one in dccp_create_openreq_child() remained.
      
      Now in the age of the syskaller fuzzer, this was finally uncovered, as
      reported by Dmitry:
      
       ---- 8< ----
      
      I've got the following report while running syzkaller fuzzer on
      86292b33 ("Merge branch 'akpm' (patches from Andrew)")
      
        [ BUG: held lock freed! ]
        4.10.0+ #234 Not tainted
        -------------------------
        syz-executor6/6898 is freeing memory
        ffff88006286cac0-ffff88006286d3b7, with a lock still held there!
         (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
        include/linux/spinlock.h:299 [inline]
         (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
        sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
        5 locks held by syz-executor6/6898:
         #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>] lock_sock
        include/net/sock.h:1460 [inline]
         #0:  (sk_lock-AF_INET6){+.+.+.}, at: [<ffffffff839a34b4>]
        inet_stream_connect+0x44/0xa0 net/ipv4/af_inet.c:681
         #1:  (rcu_read_lock){......}, at: [<ffffffff83bc1c2a>]
        inet6_csk_xmit+0x12a/0x5d0 net/ipv6/inet6_connection_sock.c:126
         #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_unlink
        include/linux/skbuff.h:1767 [inline]
         #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>] __skb_dequeue
        include/linux/skbuff.h:1783 [inline]
         #2:  (rcu_read_lock){......}, at: [<ffffffff8369b424>]
        process_backlog+0x264/0x730 net/core/dev.c:4835
         #3:  (rcu_read_lock){......}, at: [<ffffffff83aeb5c0>]
        ip6_input_finish+0x0/0x1700 net/ipv6/ip6_input.c:59
         #4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>] spin_lock
        include/linux/spinlock.h:299 [inline]
         #4:  (slock-AF_INET6){+.-...}, at: [<ffffffff8362c2c9>]
        sk_clone_lock+0x3d9/0x12c0 net/core/sock.c:1504
      
      Fix it just like was done by b0691c8e ("net: Unlock sock before calling
      sk_free()").
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170301153510.GE15145@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5afb6f9
    • P
      openvswitch: actions: fixed a brace coding style warning · f1304f7b
      Peter Downs 提交于
      Fixed a brace coding style warning reported by checkpatch.pl
      Signed-off-by: NPeter Downs <padowns@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1304f7b
    • W
      ipv6: check for ip6_null_entry in __ip6_del_rt_siblings() · e3330039
      WANG Cong 提交于
      Andrey reported a NULL pointer deref bug in ipv6_route_ioctl()
      -> ip6_route_del() -> __ip6_del_rt_siblings() code path. This is
      because ip6_null_entry is returned in this path since ip6_null_entry
      is kinda default for a ipv6 route table root node. Quote from
      David Ahern:
      
       ip6_null_entry is the root of all ipv6 fib tables making it integrated
       into the table ...
      
      We should ignore any attempt of trying to delete it, like we do in
      __ip6_del_rt() path and several others.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Fixes: 0ae81335 ("net: ipv6: Allow shorthand delete of all nexthops in multipath route")
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3330039
  5. 02 3月, 2017 13 次提交
    • I
      sched/headers: Prepare to move the task_lock()/unlock() APIs to <linux/sched/task.h> · f719ff9b
      Ingo Molnar 提交于
      But first update the code that uses these facilities with the
      new header.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      f719ff9b
    • I
      sched/headers: Prepare to use <linux/rcuupdate.h> instead of <linux/rculist.h> in <linux/sched.h> · b2d09103
      Ingo Molnar 提交于
      We don't actually need the full rculist.h header in sched.h anymore,
      we will be able to include the smaller rcupdate.h header instead.
      
      But first update code that relied on the implicit header inclusion.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      b2d09103
    • I
      sched/headers: Prepare to move the memalloc_noio_*() APIs to <linux/sched/mm.h> · 5b3cc15a
      Ingo Molnar 提交于
      Update the .c files that depend on these APIs.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5b3cc15a
    • I
      sched/headers: Prepare to move signal wakeup & sigpending methods from... · 174cd4b1
      Ingo Molnar 提交于
      sched/headers: Prepare to move signal wakeup & sigpending methods from <linux/sched.h> into <linux/sched/signal.h>
      
      Fix up affected files that include this signal functionality via sched.h.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      174cd4b1
    • I
      sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> · 5b825c3a
      Ingo Molnar 提交于
      Add #include <linux/cred.h> dependencies to all .c files rely on sched.h
      doing that for them.
      
      Note that even if the count where we need to add extra headers seems high,
      it's still a net win, because <linux/sched.h> is included in over
      2,200 files ...
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5b825c3a
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/user.h> · 8703e8a4
      Ingo Molnar 提交于
      We are going to split <linux/sched/user.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/user.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8703e8a4
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/signal.h> · 3f07c014
      Ingo Molnar 提交于
      We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/signal.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3f07c014
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/loadavg.h> · 4f17722c
      Ingo Molnar 提交于
      We are going to split <linux/sched/loadavg.h> out of <linux/sched.h>, which
      will have to be picked up from a couple of .c files.
      
      Create a trivial placeholder <linux/sched/topology.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4f17722c
    • I
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/clock.h> · e6017571
      Ingo Molnar 提交于
      We are going to split <linux/sched/clock.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and .c files.
      
      Create a trivial placeholder <linux/sched/clock.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      Include the new header in the files that are going to need it.
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e6017571
    • J
      average: change to declare precision, not factor · eb1e011a
      Johannes Berg 提交于
      Declaring the factor is counter-intuitive, and people are prone
      to using small(-ish) values even when that makes no sense.
      
      Change the DECLARE_EWMA() macro to take the fractional precision,
      in bits, rather than a factor, and update all users.
      
      While at it, add some more documentation.
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      eb1e011a
    • E
      ipv6: orphan skbs in reassembly unit · 48cac18e
      Eric Dumazet 提交于
      Andrey reported a use-after-free in IPv6 stack.
      
      Issue here is that we free the socket while it still has skb
      in TX path and in some queues.
      
      It happens here because IPv6 reassembly unit messes skb->truesize,
      breaking skb_set_owner_w() badly.
      
      We fixed a similar issue for IPV4 in commit 8282f274 ("inet: frag:
      Always orphan skbs inside ip_defrag()")
      Acked-by: NJoe Stringer <joe@ovn.org>
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
      Read of size 8 at addr ffff880062da0060 by task a.out/4140
      
      page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
      raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
      page dumped because: kasan: bad access detected
      
      CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       describe_address mm/kasan/report.c:262
       kasan_report_error+0x121/0x560 mm/kasan/report.c:370
       kasan_report mm/kasan/report.c:392
       __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
       sock_flag ./arch/x86/include/asm/bitops.h:324
       sock_wfree+0x118/0x120 net/core/sock.c:1631
       skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put ./include/net/inet_frag.h:133
       nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613
       rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x620 net/socket.c:848
       new_sync_write fs/read_write.c:499
       __vfs_write+0x483/0x760 fs/read_write.c:512
       vfs_write+0x187/0x530 fs/read_write.c:560
       SYSC_write fs/read_write.c:607
       SyS_write+0xfb/0x230 fs/read_write.c:599
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      RIP: 0033:0x7ff26e6f5b79
      RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
      RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
      RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
      
      The buggy address belongs to the object at ffff880062da0000
       which belongs to the cache RAWv6 of size 1504
      The buggy address ffff880062da0060 is located 96 bytes inside
       of 1504-byte region [ffff880062da0000, ffff880062da05e0)
      
      Freed by task 4113:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1352
       slab_free_freelist_hook mm/slub.c:1374
       slab_free mm/slub.c:2951
       kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
       sk_prot_free net/core/sock.c:1377
       __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sk_free+0x23/0x30 net/core/sock.c:1479
       sock_put ./include/net/sock.h:1638
       sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
       rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
       sock_release+0x8d/0x1e0 net/socket.c:599
       sock_close+0x16/0x20 net/socket.c:1063
       __fput+0x332/0x7f0 fs/file_table.c:208
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x19b/0x270 kernel/task_work.c:116
       exit_task_work ./include/linux/task_work.h:21
       do_exit+0x186b/0x2800 kernel/exit.c:839
       do_group_exit+0x149/0x420 kernel/exit.c:943
       SYSC_exit_group kernel/exit.c:954
       SyS_exit_group+0x1d/0x20 kernel/exit.c:952
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
       slab_post_alloc_hook mm/slab.h:432
       slab_alloc_node mm/slub.c:2708
       slab_alloc mm/slub.c:2716
       kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
       sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
       sk_alloc+0x105/0x1010 net/core/sock.c:1396
       inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
       __sock_create+0x4f6/0x880 net/socket.c:1199
       sock_create net/socket.c:1239
       SYSC_socket net/socket.c:1269
       SyS_socket+0xf9/0x230 net/socket.c:1249
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Memory state around the buggy address:
       ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cac18e
    • E
      net: net_enable_timestamp() can be called from irq contexts · 13baa00a
      Eric Dumazet 提交于
      It is now very clear that silly TCP listeners might play with
      enabling/disabling timestamping while new children are added
      to their accept queue.
      
      Meaning net_enable_timestamp() can be called from BH context
      while current state of the static key is not enabled.
      
      Lets play safe and allow all contexts.
      
      The work queue is scheduled only under the problematic cases,
      which are the static key enable/disable transition, to not slow down
      critical paths.
      
      This extends and improves what we did in commit 5fa8bbda ("net: use
      a work queue to defer net_disable_timestamp() work")
      
      Fixes: b90e5794 ("net: dont call jump_label_dec from irq context")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13baa00a
    • A
      net: don't call strlen() on the user buffer in packet_bind_spkt() · 540e2894
      Alexander Potapenko 提交于
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
      uninitialized memory in packet_bind_spkt():
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
       ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
       0000000000000000 0000000000000092 00000000ec400911 0000000000000002
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
       [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
       [<ffffffff818a783b>] __msan_warning+0x5b/0xb0
      mm/kmsan/kmsan_instr.c:424
       [<     inline     >] strlen lib/string.c:484
       [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
       [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230
      net/packet/af_packet.c:3132
       [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      chained origin: 00000000eba00911
       [<ffffffff810bb787>] save_stack_trace+0x27/0x50
      arch/x86/kernel/stacktrace.c:67
       [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
       [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
       [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0
      mm/kmsan/kmsan.c:527
       [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130
      mm/kmsan/kmsan_instr.c:380
       [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      origin description: ----address@SYSC_bind (origin=00000000eb400911)
      ==================================================================
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      , when I run the following program as root:
      
      =====================================
       #include <string.h>
       #include <sys/socket.h>
       #include <netpacket/packet.h>
       #include <net/ethernet.h>
      
       int main() {
         struct sockaddr addr;
         memset(&addr, 0xff, sizeof(addr));
         addr.sa_family = AF_PACKET;
         int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
         bind(fd, &addr, sizeof(addr));
         return 0;
       }
      =====================================
      
      This happens because addr.sa_data copied from the userspace is not
      zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
      results in calling strlen() on the kernel copy of that non-terminated
      buffer.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540e2894