1. 16 4月, 2020 2 次提交
    • X
      alinux: mm, memcg: record latency of direct compact in every memcg · 4bec5cfe
      Xu Yu 提交于
      to #26424368
      
      Probe and calculate the latency of direct compact, and then group into
      the latency histogram in struct mem_cgroup.
      
      Note that the latency in each memcg is aggregated from all child memcgs.
      
      Usage:
      
      $ cat memory.direct_compact_latency
      0-1ms:  1176
      1-5ms:  259
      5-10ms:         17
      10-100ms:       10
      100-500ms:      0
      500-1000ms:     0
      >=1000ms:       0
      total(ms):      921
      
      Each line is the count of direct compact within the appropriate latency
      range.
      
      To clear the latency histogram:
      
      $ echo 0 > memory.direct_compact_latency
      $ cat memory.direct_compact_latency
      0-1ms:  0
      1-5ms:  0
      5-10ms:         0
      10-100ms:       0
      100-500ms:      0
      500-1000ms:     0
      >=1000ms:       0
      total(ms):      0
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      4bec5cfe
    • X
      alinux: mm, memcg: record latency of direct reclaim in every memcg · 83058e75
      Xu Yu 提交于
      to #26424368
      
      Probe and calculate the latency of global direct reclaim and memcg
      direct reclaim, respectively, and then group into the latency histogram
      in struct mem_cgroup. Besides, the total latency is accumulated each
      time the histogram is updated.
      
      Note that the latency in each memcg is aggregated from all child memcgs.
      
      Usage:
      
      $ cat memory.direct_reclaim_global_latency
      0-1ms:  228
      1-5ms:  283
      5-10ms:         0
      10-100ms:       0
      100-500ms:      0
      500-1000ms:     0
      >=1000ms:       0
      total(ms):      539
      
      Each line is the count of global direct reclaim within the appropriate
      latency range.
      
      To clear the latency histogram:
      
      $ echo 0 > memory.direct_reclaim_global_latency
      $ cat memory.direct_reclaim_global_latency
      0-1ms:  0
      1-5ms:  0
      5-10ms:         0
      10-100ms:       0
      100-500ms:      0
      500-1000ms:     0
      >=1000ms:       0
      total(ms):      0
      
      The usage of memory.direct_reclaim_memcg_latency is the same as
      memory.direct_reclaim_global_latency.
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
      83058e75
  2. 15 4月, 2020 1 次提交
    • X
      alinux: Revert "net: get rid of an signed integer overflow in ip_idents_reserve()" · a5f32c14
      xuanzhuo 提交于
      fix #24463023
      
      This reverts commit adb03115.
      
      Related Links:
          https://lkml.org/lkml/2019/7/24/243
          https://lore.kernel.org/lkml/b0160f4b-b996-b0ee-405a-3d5f1866272e@gmail.com/
          https://lore.kernel.org/lkml/20181101172739.GA3196@hirez.programming.kicks-ass.net/
      
      test methods:
         1. add dummy net dev. "ip link add pps_dummy0 type dummy"
         2. set the dummy dev with addr 10.10.10.1
         3. send numerous udp packets to 10.10.10.2(fake addr) by dummy dev
      
      test command: sockperf tp -m 14 -t 20 --mps=max -i 10.10.10.2 -p 11111
      
      By default, the ip_idents_reserve function will be called to distribute the
      identities of the identities in the ip layer without the DF flag. Use this
      method to stress test this function.
      
      After testing under vm, after the concurrent CPU exceeds 12, the old patch pps
      will no longer rise. The data for concurrent CPU 32 is as follows:
      
      test without atomic_add_return:
          11:32:10 AM pps_dummy0      0.00 8008897.00      0.00 437986.55      0.00      0.00      0.00
          11:32:11 AM pps_dummy0      0.00 7992910.00      0.00 437112.27      0.00      0.00      0.00
          11:32:12 AM pps_dummy0      0.00 7982553.00      0.00 436545.87      0.00      0.00      0.00
          11:32:13 AM pps_dummy0      0.00 7977757.00      0.00 436283.59      0.00      0.00      0.00
          11:32:14 AM pps_dummy0      0.00 7968355.00      0.00 435769.41      0.00      0.00      0.00
      
      test with atomic_add_return:
          11:33:20 AM pps_dummy0      0.00 16024069.00      0.00 876316.27      0.00      0.00      0.00
          11:33:21 AM pps_dummy0      0.00 16024252.00      0.00 876326.28      0.00      0.00      0.00
          11:33:22 AM pps_dummy0      0.00 16021639.00      0.00 876183.38      0.00      0.00      0.00
          11:33:23 AM pps_dummy0      0.00 16018738.00      0.00 876024.73      0.00      0.00      0.00
          11:33:24 AM pps_dummy0      0.00 16022333.00      0.00 876221.34      0.00      0.00      0.00
          11:33:25 AM pps_dummy0      0.00 16028147.00      0.00 876539.29      0.00      0.00      0.00
      Signed-off-by: Nxuanzhuo <xuanzhuo@linux.alibaba.com>
      Acked-by: NDust Li <dust.li@linux.alibaba.com>
      a5f32c14
  3. 13 4月, 2020 18 次提交
  4. 10 4月, 2020 1 次提交
  5. 09 4月, 2020 1 次提交
  6. 07 4月, 2020 2 次提交
  7. 02 4月, 2020 5 次提交
    • E
      vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console · c2d10e03
      Eric Biggers 提交于
      fix #25967152
      
      commit ca4463bf8438b403596edd0ec961ca0d4fbe0220 upstream
      
      The VT_DISALLOCATE ioctl can free a virtual console while tty_release()
      is still running, causing a use-after-free in con_shutdown().  This
      occurs because VT_DISALLOCATE considers a virtual console's
      'struct vc_data' to be unused as soon as the corresponding tty's
      refcount hits 0.  But actually it may be still being closed.
      
      Fix this by making vc_data be reference-counted via the embedded
      'struct tty_port'.  A newly allocated virtual console has refcount 1.
      Opening it for the first time increments the refcount to 2.  Closing it
      for the last time decrements the refcount (in tty_operations::cleanup()
      so that it happens late enough), as does VT_DISALLOCATE.
      
      Reproducer:
      	#include <fcntl.h>
      	#include <linux/vt.h>
      	#include <sys/ioctl.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		if (fork()) {
      			for (;;)
      				close(open("/dev/tty5", O_RDWR));
      		} else {
      			int fd = open("/dev/tty10", O_RDWR);
      
      			for (;;)
      				ioctl(fd, VT_DISALLOCATE, 5);
      		}
      	}
      
      KASAN report:
      	BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
      	Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129
      
      	CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
      	Call Trace:
      	 [...]
      	 con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
      	 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514
      	 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629
      	 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789
      	 [...]
      
      	Allocated by task 129:
      	 [...]
      	 kzalloc include/linux/slab.h:669 [inline]
      	 vc_allocate drivers/tty/vt/vt.c:1085 [inline]
      	 vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066
      	 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229
      	 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline]
      	 tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341
      	 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
      	 tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
      	 [...]
      
      	Freed by task 130:
      	 [...]
      	 kfree+0xbf/0x1e0 mm/slab.c:3757
      	 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline]
      	 vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818
      	 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
      	 [...]
      
      Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle")
      Cc: <stable@vger.kernel.org> # v3.4+
      Reported-by: syzbot+522643ab5729b0421998@syzkaller.appspotmail.com
      Acked-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20200322034305.210082-2-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      c2d10e03
    • E
      vt: vt_ioctl: fix use-after-free in vt_in_use() · ea64290b
      Eric Biggers 提交于
      fix #25967152
      
      commit 7cf64b18b0b96e751178b8d0505d8466ff5a448f upstream
      
      vt_in_use() dereferences console_driver->ttys[i] without proper locking.
      This is broken because the tty can be closed and freed concurrently.
      
      We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
      and skipping the check of tty_struct::count.  But, looking at
      console_driver->ttys[i] isn't really appropriate anyway because even if
      it is NULL the tty can still be in the process of being closed.
      
      Instead, fix it by making vt_in_use() require console_lock() and check
      whether the vt is allocated and has port refcount > 1.  This works since
      following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
      virtual console" the port refcount is incremented while the vt is open.
      
      Reproducer (very unreliable, but it worked for me after a few minutes):
      
      	#include <fcntl.h>
      	#include <linux/vt.h>
      
      	int main()
      	{
      		int fd, nproc;
      		struct vt_stat state;
      		char ttyname[16];
      
      		fd = open("/dev/tty10", O_RDONLY);
      		for (nproc = 1; nproc < 8; nproc *= 2)
      			fork();
      		for (;;) {
      			sprintf(ttyname, "/dev/tty%d", rand() % 8);
      			close(open(ttyname, O_RDONLY));
      			ioctl(fd, VT_GETSTATE, &state);
      		}
      	}
      
      KASAN report:
      
      	BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
      	BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
      	Read of size 4 at addr ffff888065722468 by task syz-vt2/132
      
      	CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
      	Call Trace:
      	 [...]
      	 vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
      	 vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
      	 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
      	 [...]
      
      	Allocated by task 136:
      	 [...]
      	 kzalloc include/linux/slab.h:669 [inline]
      	 alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
      	 tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
      	 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
      	 tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
      	 [...]
      
      	Freed by task 41:
      	 [...]
      	 kfree+0xbf/0x200 mm/slab.c:3757
      	 free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
      	 release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
      	 process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
      	 worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
      	 [...]
      
      Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle")
      Cc: <stable@vger.kernel.org> # v3.4+
      Acked-by: NJiri Slaby <jslaby@suse.cz>
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      ea64290b
    • J
      vt: ioctl, switch VT_IS_IN_USE and VT_BUSY to inlines · 5e8474ca
      Jiri Slaby 提交于
      fix #25967152
      
      commit e587e8f17433ddb26954f0edf5b2f95c42155ae9 upstream
      
      These two were macros. Switch them to static inlines, so that it's more
      understandable what they are doing.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Link: https://lore.kernel.org/r/20200219073951.16151-2-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      5e8474ca
    • J
      vt: selection, introduce vc_is_sel · 0b9c1057
      Jiri Slaby 提交于
      fix #25967152
      
      commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream
      
      Avoid global variables (namely sel_cons) by introducing vc_is_sel. It
      checks whether the parameter is the current selection console. This will
      help putting sel_cons to a struct later.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      0b9c1057
    • J
      io_uring: use current task creds instead of allocating a new one · 311b786d
      Jens Axboe 提交于
      fix #26374723
      
      commit 0b8c0ec7eedcd8f9f1a1f238d87f9b512b09e71a upstream.
      
      syzbot reports:
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
        io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274
        kthread+0x361/0x430 kernel/kthread.c:255
        ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352
      Modules linked in:
      ---[ end trace f2e1a4307fbe2245 ]---
      RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline]
      RIP: 0010:__validate_creds include/linux/cred.h:187 [inline]
      RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550
      Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c
      24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84
      c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf
      RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318
      RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010
      RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849
      R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000
      R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000
      FS:  0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      
      which is caused by slab fault injection triggering a failure in
      prepare_creds(). We don't actually need to create a copy of the creds
      as we're not modifying it, we just need a reference on the current task
      creds. This avoids the failure case as well, and propagates the const
      throughout the stack.
      
      Fixes: 181e448d8709 ("io_uring: async workers should inherit the user creds")
      Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      311b786d
  8. 26 3月, 2020 1 次提交
  9. 25 3月, 2020 5 次提交
  10. 20 3月, 2020 2 次提交
    • A
      vfs: fix do_last() regression · 6073719d
      Al Viro 提交于
      commit 6404674acd596de41fd3ad5f267b4525494a891a upstream
      
      Brown paperbag time: fetching ->i_uid/->i_mode really should've been
      done from nd->inode.  I even suggested that, but the reason for that has
      slipped through the cracks and I went for dir->d_inode instead - made
      for more "obvious" patch.
      
      Analysis:
      
       - at the entry into do_last() and all the way to step_into(): dir (aka
         nd->path.dentry) is known not to have been freed; so's nd->inode and
         it's equal to dir->d_inode unless we are already doomed to -ECHILD.
         inode of the file to get opened is not known.
      
       - after step_into(): inode of the file to get opened is known; dir
         might be pointing to freed memory/be negative/etc.
      
       - at the call of may_create_in_sticky(): guaranteed to be out of RCU
         mode; inode of the file to get opened is known and pinned; dir might
         be garbage.
      
      The last was the reason for the original patch.  Except that at the
      do_last() entry we can be in RCU mode and it is possible that
      nd->path.dentry->d_inode has already changed under us.
      
      In that case we are going to fail with -ECHILD, but we need to be
      careful; nd->inode is pointing to valid struct inode and it's the same
      as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
      should use that.
      Reported-by: N"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
      Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
      Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@kernel.org
      Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      6073719d
    • J
      io-wq: wait for io_wq_create() to setup necessary workers · 4c628e9d
      Jens Axboe 提交于
      commit b60fda6000a99a7ccac36005ab78b14b47c06de3 upstream
      
      We currently have a race where if setup is really slow, we can be
      calling io_wq_destroy() before we're done setting up. This will cause
      the caller to get stuck waiting for the manager to set things up, but
      the manager already exited.
      
      Fix this by doing a sync setup of the manager. This also fixes the case
      where if we failed creating workers, we'd also get stuck.
      
      In practice this race window was really small, as we already wait for
      the manager to start. Hence someone would have to call io_wq_destroy()
      after the task has started, but before it started the first loop. The
      reported test case forked tons of these, which is why it became an
      issue.
      
      Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com
      Fixes: 771b53d033e8 ("io-wq: small threadpool implementation for io_uring")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      4c628e9d
  11. 19 3月, 2020 2 次提交