1. 25 4月, 2018 1 次提交
  2. 16 4月, 2018 1 次提交
    • E
      net: af_packet: fix race in PACKET_{R|T}X_RING · 5171b37d
      Eric Dumazet 提交于
      In order to remove the race caught by syzbot [1], we need
      to lock the socket before using po->tp_version as this could
      change under us otherwise.
      
      This means lock_sock() and release_sock() must be done by
      packet_set_ring() callers.
      
      [1] :
      BUG: KMSAN: uninit-value in packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
      CPU: 0 PID: 20195 Comm: syzkaller707632 Not tainted 4.16.0+ #83
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x185/0x1d0 lib/dump_stack.c:53
       kmsan_report+0x142/0x240 mm/kmsan/kmsan.c:1067
       __msan_warning_32+0x6c/0xb0 mm/kmsan/kmsan_instr.c:676
       packet_set_ring+0x1254/0x3870 net/packet/af_packet.c:4249
       packet_setsockopt+0x12c6/0x5a90 net/packet/af_packet.c:3662
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
       SyS_setsockopt+0x76/0xa0 net/socket.c:1828
       do_syscall_64+0x309/0x430 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      RIP: 0033:0x449099
      RSP: 002b:00007f42b5307ce8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
      RAX: ffffffffffffffda RBX: 000000000070003c RCX: 0000000000449099
      RDX: 0000000000000005 RSI: 0000000000000107 RDI: 0000000000000003
      RBP: 0000000000700038 R08: 000000000000001c R09: 0000000000000000
      R10: 00000000200000c0 R11: 0000000000000246 R12: 0000000000000000
      R13: 000000000080eecf R14: 00007f42b53089c0 R15: 0000000000000001
      
      Local variable description: ----req_u@packet_setsockopt
      Variable was created at:
       packet_setsockopt+0x13f/0x5a90 net/packet/af_packet.c:3612
       SYSC_setsockopt+0x4b8/0x570 net/socket.c:1849
      
      Fixes: f6fb8f10 ("af-packet: TPACKET_V3 flexible buffer implementation.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5171b37d
  3. 28 3月, 2018 1 次提交
  4. 13 2月, 2018 2 次提交
    • K
      net: Convert packet_net_ops · cb5e3400
      Kirill Tkhai 提交于
      These pernet_operations just create and destroy /proc entry,
      and another operations do not touch it.
      
      Also, nobody else are interested in foreign net::packet::sklist.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb5e3400
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  5. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  6. 17 1月, 2018 1 次提交
    • A
      net: delete /proc THIS_MODULE references · 96890d62
      Alexey Dobriyan 提交于
      /proc has been ignoring struct file_operations::owner field for 10 years.
      Specifically, it started with commit 786d7e16
      ("Fix rmmod/read/write races in /proc entries"). Notice the chunk where
      inode->i_fop is initialized with proxy struct file_operations for
      regular files:
      
      	-               if (de->proc_fops)
      	-                       inode->i_fop = de->proc_fops;
      	+               if (de->proc_fops) {
      	+                       if (S_ISREG(inode->i_mode))
      	+                               inode->i_fop = &proc_reg_file_ops;
      	+                       else
      	+                               inode->i_fop = de->proc_fops;
      	+               }
      
      VFS stopped pinning module at this point.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      96890d62
  7. 20 12月, 2017 1 次提交
  8. 29 11月, 2017 2 次提交
    • E
      net/packet: fix a race in packet_bind() and packet_notifier() · 15fe076e
      Eric Dumazet 提交于
      syzbot reported crashes [1] and provided a C repro easing bug hunting.
      
      When/if packet_do_bind() calls __unregister_prot_hook() and releases
      po->bind_lock, another thread can run packet_notifier() and process an
      NETDEV_UP event.
      
      This calls register_prot_hook() and hooks again the socket right before
      first thread is able to grab again po->bind_lock.
      
      Fixes this issue by temporarily setting po->num to 0, as suggested by
      David Miller.
      
      [1]
      dev_remove_pack: ffff8801bf16fa80 not found
      ------------[ cut here ]------------
      kernel BUG at net/core/dev.c:7945!  ( BUG_ON(!list_empty(&dev->ptype_all)); )
      invalid opcode: 0000 [#1] SMP KASAN
      Dumping ftrace buffer:
         (ftrace buffer empty)
      Modules linked in:
      device syz0 entered promiscuous mode
      CPU: 0 PID: 3161 Comm: syzkaller404108 Not tainted 4.14.0+ #190
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      task: ffff8801cc57a500 task.stack: ffff8801cc588000
      RIP: 0010:netdev_run_todo+0x772/0xae0 net/core/dev.c:7945
      RSP: 0018:ffff8801cc58f598 EFLAGS: 00010293
      RAX: ffff8801cc57a500 RBX: dffffc0000000000 RCX: ffffffff841f75b2
      RDX: 0000000000000000 RSI: 1ffff100398b1ede RDI: ffff8801bf1f8810
      device syz0 entered promiscuous mode
      RBP: ffff8801cc58f898 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801bf1f8cd8
      R13: ffff8801cc58f870 R14: ffff8801bf1f8780 R15: ffff8801cc58f7f0
      FS:  0000000001716880(0000) GS:ffff8801db400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000020b13000 CR3: 0000000005e25000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       rtnl_unlock+0xe/0x10 net/core/rtnetlink.c:106
       tun_detach drivers/net/tun.c:670 [inline]
       tun_chr_close+0x49/0x60 drivers/net/tun.c:2845
       __fput+0x333/0x7f0 fs/file_table.c:210
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x199/0x270 kernel/task_work.c:113
       exit_task_work include/linux/task_work.h:22 [inline]
       do_exit+0x9bb/0x1ae0 kernel/exit.c:865
       do_group_exit+0x149/0x400 kernel/exit.c:968
       SYSC_exit_group kernel/exit.c:979 [inline]
       SyS_exit_group+0x1d/0x20 kernel/exit.c:977
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x44ad19
      
      Fixes: 30f7ea1c ("packet: race condition in packet_bind")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Francesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15fe076e
    • M
      packet: fix crash in fanout_demux_rollover() · 57f015f5
      Mike Maloney 提交于
      syzkaller found a race condition fanout_demux_rollover() while removing
      a packet socket from a fanout group.
      
      po->rollover is read and operated on during packet_rcv_fanout(), via
      fanout_demux_rollover(), but the pointer is currently cleared before the
      synchronization in packet_release().   It is safer to delay the cleanup
      until after synchronize_net() has been called, ensuring all calls to
      packet_rcv_fanout() for this socket have finished.
      
      To further simplify synchronization around the rollover structure, set
      po->rollover in fanout_add() only if there are no errors.  This removes
      the need for rcu in the struct and in the call to
      packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).
      
      Crashing stack trace:
       fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
       packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
       dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
       xmit_one net/core/dev.c:2975 [inline]
       dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
       __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
       neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
       neigh_output include/net/neighbour.h:482 [inline]
       ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
       ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
       NF_HOOK_COND include/linux/netfilter.h:239 [inline]
       ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
       dst_output include/net/dst.h:459 [inline]
       NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
       mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
       mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
       mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
       ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
       addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
       addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
       process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
       worker_thread+0x223/0x1990 kernel/workqueue.c:2247
       kthread+0x35e/0x430 kernel/kthread.c:231
       ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432
      
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Fixes: 509c7a1e ("packet: avoid panic in packet_getsockopt()")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NMike Maloney <maloney@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57f015f5
  9. 28 11月, 2017 1 次提交
  10. 14 11月, 2017 1 次提交
  11. 25 10月, 2017 1 次提交
    • K
      net: af_packet: Convert timers to use timer_setup() · 17bfd8c8
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Mike Maloney <maloney@google.com>
      Cc: Jarno Rajahalme <jarno@ovn.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17bfd8c8
  12. 21 10月, 2017 1 次提交
    • E
      packet: avoid panic in packet_getsockopt() · 509c7a1e
      Eric Dumazet 提交于
      syzkaller got crashes in packet_getsockopt() processing
      PACKET_ROLLOVER_STATS command while another thread was managing
      to change po->rollover
      
      Using RCU will fix this bug. We might later add proper RCU annotations
      for sparse sake.
      
      In v2: I replaced kfree(rollover) in fanout_add() to kfree_rcu()
      variant, as spotted by John.
      
      Fixes: a9b63918 ("packet: rollover statistics")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: John Sperbeck <jsperbeck@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      509c7a1e
  13. 29 9月, 2017 2 次提交
  14. 26 9月, 2017 1 次提交
  15. 21 9月, 2017 1 次提交
    • W
      packet: hold bind lock when rebinding to fanout hook · 008ba2a1
      Willem de Bruijn 提交于
      Packet socket bind operations must hold the po->bind_lock. This keeps
      po->running consistent with whether the socket is actually on a ptype
      list to receive packets.
      
      fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
      binds the fanout object to receive through packet_rcv_fanout.
      
      Make it hold the po->bind_lock when testing po->running and rebinding.
      Else, it can race with other rebind operations, such as that in
      packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
      can result in a socket being added to a fanout group twice, causing
      use-after-free KASAN bug reports, among others.
      
      Reported independently by both trinity and syzkaller.
      Verified that the syzkaller reproducer passes after this patch.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Reported-by: Nnixioaming <nixiaoming@huawei.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      008ba2a1
  16. 30 8月, 2017 1 次提交
  17. 11 8月, 2017 1 次提交
  18. 25 7月, 2017 1 次提交
    • W
      packet: fix use-after-free in prb_retire_rx_blk_timer_expired() · c800aaf8
      WANG Cong 提交于
      There are multiple reports showing we have a use-after-free in
      the timer prb_retire_rx_blk_timer_expired(), where we use struct
      tpacket_kbdq_core::pkbdq, a pg_vec, after it gets freed by
      free_pg_vec().
      
      The interesting part is it is not freed via packet_release() but
      via packet_setsockopt(), which means we are not closing the socket.
      Looking into the big and fat function packet_set_ring(), this could
      happen if we satisfy the following conditions:
      
      1. closing == 0, not on packet_release() path
      2. req->tp_block_nr == 0, we don't allocate a new pg_vec
      3. rx_ring->pg_vec is already set as V3, which means we already called
         packet_set_ring() wtih req->tp_block_nr > 0 previously
      4. req->tp_frame_nr == 0, pass sanity check
      5. po->mapped == 0, never called mmap()
      
      In this scenario we are clearing the old rx_ring->pg_vec, so we need
      to free this pg_vec, but we don't stop the timer on this path because
      of closing==0.
      
      The timer has to be stopped as long as we need to free pg_vec, therefore
      the check on closing!=0 is wrong, we should check pg_vec!=NULL instead.
      
      Thanks to liujian for testing different fixes.
      
      Reported-by: alexander.levin@verizon.com
      Reported-by: NDave Jones <davej@codemonkey.org.uk>
      Reported-by: Nliujian (CE) <liujian56@huawei.com>
      Tested-by: Nliujian (CE) <liujian56@huawei.com>
      Cc: Ding Tianhong <dingtianhong@huawei.com>
      Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c800aaf8
  19. 20 7月, 2017 1 次提交
  20. 14 7月, 2017 1 次提交
    • I
      net/packet: Fix Tx queue selection for AF_PACKET · ccd4eb49
      Iván Briano 提交于
      When PACKET_QDISC_BYPASS is not used, Tx queue selection will be done
      before the packet is enqueued, taking into account any mappings set by
      a queuing discipline such as mqprio without hardware offloading. This
      selection may be affected by a previously saved queue_mapping, either on
      the Rx path, or done before the packet reaches the device, as it's
      currently the case for AF_PACKET.
      
      In order for queue selection to work as expected when using traffic
      control, there can't be another selection done before that point is
      reached, so move the call to packet_pick_tx_queue to
      packet_direct_xmit, leaving the default xmit path as it was before
      PACKET_QDISC_BYPASS was introduced.
      
      A forward declaration of packet_pick_tx_queue() is introduced to avoid
      the need to reorder the functions within the file.
      
      Fixes: d346a3fa ("packet: introduce PACKET_QDISC_BYPASS socket option")
      Signed-off-by: NIván Briano <ivan.briano@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ccd4eb49
  21. 01 7月, 2017 3 次提交
  22. 11 6月, 2017 1 次提交
  23. 26 5月, 2017 1 次提交
  24. 16 5月, 2017 1 次提交
  25. 26 4月, 2017 1 次提交
  26. 25 4月, 2017 1 次提交
    • M
      packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id. · 4a69a864
      Mike Maloney 提交于
      Fanout uses a per net global namespace. A process that intends to create
      a new fanout group can accidentally join an existing group. It is not
      possible to detect this.
      
      Add socket option PACKET_FANOUT_FLAG_UNIQUEID.  When specified the
      supplied fanout group id must be set to 0, and the kernel chooses an id
      that is not already in use.  This is an ephemeral flag so that
      other sockets can be added to this group using setsockopt, but NOT
      specifying this flag.  The current getsockopt(..., PACKET_FANOUT, ...)
      can be used to retrieve the new group id.
      
      We assume that there are not a lot of fanout groups and that this is not
      a high frequency call.
      
      The method assigns ids starting at zero and increases until it finds an
      unused id.  It keeps track of the last assigned id, and uses it as a
      starting point to find new ids.
      Signed-off-by: NMike Maloney <maloney@google.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a69a864
  27. 31 3月, 2017 3 次提交
  28. 02 3月, 2017 1 次提交
    • A
      net: don't call strlen() on the user buffer in packet_bind_spkt() · 540e2894
      Alexander Potapenko 提交于
      KMSAN (KernelMemorySanitizer, a new error detection tool) reports use of
      uninitialized memory in packet_bind_spkt():
      Acked-by: NEric Dumazet <edumazet@google.com>
      
      ==================================================================
      BUG: KMSAN: use of unitialized memory
      CPU: 0 PID: 1074 Comm: packet Not tainted 4.8.0-rc6+ #1891
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
      01/01/2011
       0000000000000000 ffff88006b6dfc08 ffffffff82559ae8 ffff88006b6dfb48
       ffffffff818a7c91 ffffffff85b9c870 0000000000000092 ffffffff85b9c550
       0000000000000000 0000000000000092 00000000ec400911 0000000000000002
      Call Trace:
       [<     inline     >] __dump_stack lib/dump_stack.c:15
       [<ffffffff82559ae8>] dump_stack+0x238/0x290 lib/dump_stack.c:51
       [<ffffffff818a6626>] kmsan_report+0x276/0x2e0 mm/kmsan/kmsan.c:1003
       [<ffffffff818a783b>] __msan_warning+0x5b/0xb0
      mm/kmsan/kmsan_instr.c:424
       [<     inline     >] strlen lib/string.c:484
       [<ffffffff8259b58d>] strlcpy+0x9d/0x200 lib/string.c:144
       [<ffffffff84b2eca4>] packet_bind_spkt+0x144/0x230
      net/packet/af_packet.c:3132
       [<ffffffff84242e4d>] SYSC_bind+0x40d/0x5f0 net/socket.c:1370
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      chained origin: 00000000eba00911
       [<ffffffff810bb787>] save_stack_trace+0x27/0x50
      arch/x86/kernel/stacktrace.c:67
       [<     inline     >] kmsan_save_stack_with_flags mm/kmsan/kmsan.c:322
       [<     inline     >] kmsan_save_stack mm/kmsan/kmsan.c:334
       [<ffffffff818a59f8>] kmsan_internal_chain_origin+0x118/0x1e0
      mm/kmsan/kmsan.c:527
       [<ffffffff818a7773>] __msan_set_alloca_origin4+0xc3/0x130
      mm/kmsan/kmsan_instr.c:380
       [<ffffffff84242b69>] SYSC_bind+0x129/0x5f0 net/socket.c:1356
       [<ffffffff84242a22>] SyS_bind+0x82/0xa0 net/socket.c:1356
       [<ffffffff8515991b>] entry_SYSCALL_64_fastpath+0x13/0x8f
      arch/x86/entry/entry_64.o:?
      origin description: ----address@SYSC_bind (origin=00000000eb400911)
      ==================================================================
      (the line numbers are relative to 4.8-rc6, but the bug persists
      upstream)
      
      , when I run the following program as root:
      
      =====================================
       #include <string.h>
       #include <sys/socket.h>
       #include <netpacket/packet.h>
       #include <net/ethernet.h>
      
       int main() {
         struct sockaddr addr;
         memset(&addr, 0xff, sizeof(addr));
         addr.sa_family = AF_PACKET;
         int fd = socket(PF_PACKET, SOCK_PACKET, htons(ETH_P_ALL));
         bind(fd, &addr, sizeof(addr));
         return 0;
       }
      =====================================
      
      This happens because addr.sa_data copied from the userspace is not
      zero-terminated, and copying it with strlcpy() in packet_bind_spkt()
      results in calling strlen() on the kernel copy of that non-terminated
      buffer.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      540e2894
  29. 18 2月, 2017 1 次提交
    • A
      packet: Do not call fanout_release from atomic contexts · 2bd624b4
      Anoob Soman 提交于
      Commit 66644982 ("packet: call fanout_release, while UNREGISTERING a
      netdev"), unfortunately, introduced the following issues.
      
      1. calling mutex_lock(&fanout_mutex) (fanout_release()) from inside
      rcu_read-side critical section. rcu_read_lock disables preemption, most often,
      which prohibits calling sleeping functions.
      
      [  ] include/linux/rcupdate.h:560 Illegal context switch in RCU read-side critical section!
      [  ]
      [  ] rcu_scheduler_active = 1, debug_locks = 0
      [  ] 4 locks held by ovs-vswitchd/1969:
      [  ]  #0:  (cb_lock){++++++}, at: [<ffffffff8158a6c9>] genl_rcv+0x19/0x40
      [  ]  #1:  (ovs_mutex){+.+.+.}, at: [<ffffffffa04878ca>] ovs_vport_cmd_del+0x4a/0x100 [openvswitch]
      [  ]  #2:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81564157>] rtnl_lock+0x17/0x20
      [  ]  #3:  (rcu_read_lock){......}, at: [<ffffffff81614165>] packet_notifier+0x5/0x3f0
      [  ]
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810c9077>] lockdep_rcu_suspicious+0x107/0x110
      [  ]  [<ffffffff810a2da7>] ___might_sleep+0x57/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff810de93f>] ? vprintk_default+0x1f/0x30
      [  ]  [<ffffffff81186e88>] ? printk+0x4d/0x4f
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      2. calling mutex_lock(&fanout_mutex) inside spin_lock(&po->bind_lock).
      "sleeping function called from invalid context"
      
      [  ] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [  ] in_atomic(): 1, irqs_disabled(): 0, pid: 1969, name: ovs-vswitchd
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff810a2f52>] ___might_sleep+0x202/0x210
      [  ]  [<ffffffff810a2fd0>] __might_sleep+0x70/0x90
      [  ]  [<ffffffff8162e80c>] mutex_lock_nested+0x3c/0x3a0
      [  ]  [<ffffffff816106dd>] fanout_release+0x1d/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      3. calling dev_remove_pack(&fanout->prot_hook), from inside
      spin_lock(&po->bind_lock) or rcu_read-side critical-section. dev_remove_pack()
      -> synchronize_net(), which might sleep.
      
      [  ] BUG: scheduling while atomic: ovs-vswitchd/1969/0x00000002
      [  ] INFO: lockdep is turned off.
      [  ] Call Trace:
      [  ]  [<ffffffff813770c1>] dump_stack+0x85/0xc4
      [  ]  [<ffffffff81186274>] __schedule_bug+0x64/0x73
      [  ]  [<ffffffff8162b8cb>] __schedule+0x6b/0xd10
      [  ]  [<ffffffff8162c5db>] schedule+0x6b/0x80
      [  ]  [<ffffffff81630b1d>] schedule_timeout+0x38d/0x410
      [  ]  [<ffffffff810ea3fd>] synchronize_sched_expedited+0x53d/0x810
      [  ]  [<ffffffff810ea6de>] synchronize_rcu_expedited+0xe/0x10
      [  ]  [<ffffffff8154eab5>] synchronize_net+0x35/0x50
      [  ]  [<ffffffff8154eae3>] dev_remove_pack+0x13/0x20
      [  ]  [<ffffffff8161077e>] fanout_release+0xbe/0xe0
      [  ]  [<ffffffff81614459>] packet_notifier+0x2f9/0x3f0
      
      4. fanout_release() races with calls from different CPU.
      
      To fix the above problems, remove the call to fanout_release() under
      rcu_read_lock(). Instead, call __dev_remove_pack(&fanout->prot_hook) and
      netdev_run_todo will be happy that &dev->ptype_specific list is empty. In order
      to achieve this, I moved dev_{add,remove}_pack() out of fanout_{add,release} to
      __fanout_{link,unlink}. So, call to {,__}unregister_prot_hook() will make sure
      fanout->prot_hook is removed as well.
      
      Fixes: 66644982 ("packet: call fanout_release, while UNREGISTERING a netdev")
      Reported-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAnoob Soman <anoob.soman@citrix.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bd624b4
  30. 15 2月, 2017 1 次提交
    • E
      packet: fix races in fanout_add() · d199fab6
      Eric Dumazet 提交于
      Multiple threads can call fanout_add() at the same time.
      
      We need to grab fanout_mutex earlier to avoid races that could
      lead to one thread freeing po->rollover that was set by another thread.
      
      Do the same in fanout_release(), for peace of mind, and to help us
      finding lockdep issues earlier.
      
      Fixes: dc99f600 ("packet: Add fanout support.")
      Fixes: 0648ab70 ("packet: rollover prepare: per-socket state")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d199fab6
  31. 09 2月, 2017 1 次提交
  32. 21 1月, 2017 1 次提交
  33. 05 1月, 2017 1 次提交