1. 13 4月, 2020 2 次提交
    • W
      mm/page_poison: expose page_poisoning_enabled to kernel modules · 1b7f4f07
      Wei Wang 提交于
      to #26589565
      
      commit d95f58f4a6ca002126c36c530fb096519c94baef upstream
      
      In some usages, e.g. virtio-balloon, a kernel module needs to know if
      page poisoning is in use. This patch exposes the page_poisoning_enabled
      function to kernel modules.
      Signed-off-by: NWei Wang <wei.w.wang@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      1b7f4f07
    • W
      virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT · 0db17e11
      Wei Wang 提交于
      to #26589565
      
      commit 86a559787e6f5cf662c081363f64a20cad654195 upstream
      
      Negotiation of the VIRTIO_BALLOON_F_FREE_PAGE_HINT feature indicates the
      support of reporting hints of guest free pages to host via virtio-balloon.
      Currenlty, only free page blocks of MAX_ORDER - 1 are reported. They are
      obtained one by one from the mm free list via the regular allocation
      function.
      
      Host requests the guest to report free page hints by sending a new cmd id
      to the guest via the free_page_report_cmd_id configuration register. When
      the guest starts to report, it first sends a start cmd to host via the
      free page vq, which acks to host the cmd id received. When the guest
      finishes reporting free pages, a stop cmd is sent to host via the vq.
      Host may also send a stop cmd id to the guest to stop the reporting.
      
      VIRTIO_BALLOON_CMD_ID_STOP: Host sends this cmd to stop the guest reporting.
      VIRTIO_BALLOON_CMD_ID_DONE: Host sends this cmd to tell the guest that
      the reported pages are ready to be freed.
      
      Why does the guest free the reported pages when host tells it is ready to free?
      This is because freeing pages appears to be expensive for live migration.
      free_pages() dirties memory very quickly and makes the live migraion not
      converge in some cases. So it is good to delay the free_page operation
      when the migration is done, and host sends a command to guest about that.
      
      Why do we need the new VIRTIO_BALLOON_CMD_ID_DONE, instead of reusing
      VIRTIO_BALLOON_CMD_ID_STOP?
      This is because live migration is usually done in several rounds. At the
      end of each round, host needs to send a VIRTIO_BALLOON_CMD_ID_STOP cmd to
      the guest to stop (or say pause) the reporting. The guest resumes the
      reporting when it receives a new command id at the beginning of the next
      round. So we need a new cmd id to distinguish between "stop reporting" and
      "ready to free the reported pages".
      
      TODO:
      - Add a batch page allocation API to amortize the allocation overhead.
      Signed-off-by: NWei Wang <wei.w.wang@intel.com>
      Signed-off-by: NLiang Li <liang.z.li@intel.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com>
      0db17e11
  2. 10 4月, 2020 1 次提交
  3. 09 4月, 2020 1 次提交
  4. 07 4月, 2020 2 次提交
  5. 02 4月, 2020 5 次提交
    • E
      vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console · c2d10e03
      Eric Biggers 提交于
      fix #25967152
      
      commit ca4463bf8438b403596edd0ec961ca0d4fbe0220 upstream
      
      The VT_DISALLOCATE ioctl can free a virtual console while tty_release()
      is still running, causing a use-after-free in con_shutdown().  This
      occurs because VT_DISALLOCATE considers a virtual console's
      'struct vc_data' to be unused as soon as the corresponding tty's
      refcount hits 0.  But actually it may be still being closed.
      
      Fix this by making vc_data be reference-counted via the embedded
      'struct tty_port'.  A newly allocated virtual console has refcount 1.
      Opening it for the first time increments the refcount to 2.  Closing it
      for the last time decrements the refcount (in tty_operations::cleanup()
      so that it happens late enough), as does VT_DISALLOCATE.
      
      Reproducer:
      	#include <fcntl.h>
      	#include <linux/vt.h>
      	#include <sys/ioctl.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		if (fork()) {
      			for (;;)
      				close(open("/dev/tty5", O_RDWR));
      		} else {
      			int fd = open("/dev/tty10", O_RDWR);
      
      			for (;;)
      				ioctl(fd, VT_DISALLOCATE, 5);
      		}
      	}
      
      KASAN report:
      	BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
      	Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129
      
      	CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
      	Call Trace:
      	 [...]
      	 con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
      	 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514
      	 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629
      	 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789
      	 [...]
      
      	Allocated by task 129:
      	 [...]
      	 kzalloc include/linux/slab.h:669 [inline]
      	 vc_allocate drivers/tty/vt/vt.c:1085 [inline]
      	 vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066
      	 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229
      	 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline]
      	 tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341
      	 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
      	 tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
      	 [...]
      
      	Freed by task 130:
      	 [...]
      	 kfree+0xbf/0x1e0 mm/slab.c:3757
      	 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline]
      	 vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818
      	 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
      	 [...]
      
      Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle")
      Cc: <stable@vger.kernel.org> # v3.4+
      Reported-by: syzbot+522643ab5729b0421998@syzkaller.appspotmail.com
      Acked-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20200322034305.210082-2-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      c2d10e03
    • E
      vt: vt_ioctl: fix use-after-free in vt_in_use() · ea64290b
      Eric Biggers 提交于
      fix #25967152
      
      commit 7cf64b18b0b96e751178b8d0505d8466ff5a448f upstream
      
      vt_in_use() dereferences console_driver->ttys[i] without proper locking.
      This is broken because the tty can be closed and freed concurrently.
      
      We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
      and skipping the check of tty_struct::count.  But, looking at
      console_driver->ttys[i] isn't really appropriate anyway because even if
      it is NULL the tty can still be in the process of being closed.
      
      Instead, fix it by making vt_in_use() require console_lock() and check
      whether the vt is allocated and has port refcount > 1.  This works since
      following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
      virtual console" the port refcount is incremented while the vt is open.
      
      Reproducer (very unreliable, but it worked for me after a few minutes):
      
      	#include <fcntl.h>
      	#include <linux/vt.h>
      
      	int main()
      	{
      		int fd, nproc;
      		struct vt_stat state;
      		char ttyname[16];
      
      		fd = open("/dev/tty10", O_RDONLY);
      		for (nproc = 1; nproc < 8; nproc *= 2)
      			fork();
      		for (;;) {
      			sprintf(ttyname, "/dev/tty%d", rand() % 8);
      			close(open(ttyname, O_RDONLY));
      			ioctl(fd, VT_GETSTATE, &state);
      		}
      	}
      
      KASAN report:
      
      	BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
      	BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
      	Read of size 4 at addr ffff888065722468 by task syz-vt2/132
      
      	CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
      	Call Trace:
      	 [...]
      	 vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
      	 vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
      	 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
      	 [...]
      
      	Allocated by task 136:
      	 [...]
      	 kzalloc include/linux/slab.h:669 [inline]
      	 alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
      	 tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
      	 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
      	 tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
      	 [...]
      
      	Freed by task 41:
      	 [...]
      	 kfree+0xbf/0x200 mm/slab.c:3757
      	 free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
      	 release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
      	 process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
      	 worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
      	 [...]
      
      Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle")
      Cc: <stable@vger.kernel.org> # v3.4+
      Acked-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      ea64290b
    • J
      vt: ioctl, switch VT_IS_IN_USE and VT_BUSY to inlines · 5e8474ca
      Jiri Slaby 提交于
      fix #25967152
      
      commit e587e8f17433ddb26954f0edf5b2f95c42155ae9 upstream
      
      These two were macros. Switch them to static inlines, so that it's more
      understandable what they are doing.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Link: https://lore.kernel.org/r/20200219073951.16151-2-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      5e8474ca
    • J
      vt: selection, introduce vc_is_sel · 0b9c1057
      Jiri Slaby 提交于
      fix #25967152
      
      commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream
      
      Avoid global variables (namely sel_cons) by introducing vc_is_sel. It
      checks whether the parameter is the current selection console. This will
      help putting sel_cons to a struct later.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      0b9c1057
    • J
      io_uring: use current task creds instead of allocating a new one · 311b786d
      Jens Axboe 提交于
      fix #26374723
      
      commit 0b8c0ec7eedcd8f9f1a1f238d87f9b512b09e71a upstream.
      
      syzbot reports:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
        kthread+0x361/0x430 kernel/kthread.c:255
        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace f2e1a4307fbe2245 ]---
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is caused by slab fault injection triggering a failure in
      prepare_creds(). We don't actually need to create a copy of the creds
      as we're not modifying it, we just need a reference on the current task
      creds. This avoids the failure case as well, and propagates the const
      throughout the stack.
      
      Fixes: 181e448d8709 ("io_uring: async workers should inherit the user creds")
      Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      311b786d
  6. 26 3月, 2020 1 次提交
  7. 25 3月, 2020 5 次提交
  8. 20 3月, 2020 2 次提交
    • A
      vfs: fix do_last() regression · 6073719d
      Al Viro 提交于
      commit 6404674acd596de41fd3ad5f267b4525494a891a upstream
      
      Brown paperbag time: fetching ->i_uid/->i_mode really should've been
      done from nd->inode.  I even suggested that, but the reason for that has
      slipped through the cracks and I went for dir->d_inode instead - made
      for more "obvious" patch.
      
      Analysis:
      
       - at the entry into do_last() and all the way to step_into(): dir (aka
         nd->path.dentry) is known not to have been freed; so's nd->inode and
         it's equal to dir->d_inode unless we are already doomed to -ECHILD.
         inode of the file to get opened is not known.
      
       - after step_into(): inode of the file to get opened is known; dir
         might be pointing to freed memory/be negative/etc.
      
       - at the call of may_create_in_sticky(): guaranteed to be out of RCU
         mode; inode of the file to get opened is known and pinned; dir might
         be garbage.
      
      The last was the reason for the original patch.  Except that at the
      do_last() entry we can be in RCU mode and it is possible that
      nd->path.dentry->d_inode has already changed under us.
      
      In that case we are going to fail with -ECHILD, but we need to be
      careful; nd->inode is pointing to valid struct inode and it's the same
      as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
      should use that.
      Reported-by: N"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
      Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
      Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      6073719d
    • J
      io-wq: wait for io_wq_create() to setup necessary workers · 4c628e9d
      Jens Axboe 提交于
      commit b60fda6000a99a7ccac36005ab78b14b47c06de3 upstream
      
      We currently have a race where if setup is really slow, we can be
      calling io_wq_destroy() before we're done setting up. This will cause
      the caller to get stuck waiting for the manager to set things up, but
      the manager already exited.
      
      Fix this by doing a sync setup of the manager. This also fixes the case
      where if we failed creating workers, we'd also get stuck.
      
      In practice this race window was really small, as we already wait for
      the manager to start. Hence someone would have to call io_wq_destroy()
      after the task has started, but before it started the first loop. The
      reported test case forked tons of these, which is why it became an
      issue.
      
      Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
      Fixes: 771b53d033e8 ("io-wq: small threadpool implementation for io_uring")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4c628e9d
  9. 19 3月, 2020 3 次提交
  10. 18 3月, 2020 18 次提交