- 09 4月, 2020 1 次提交
-
-
由 Juergen Gross 提交于
fix #26417771 commit b9705d8778e7adc97de38f405f835a2426e14d84 upstream. Commit 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") is causing a regression on some systems when the kernel is booted as Xen dom0. The system will just hang in early boot. Reason is an endless loop in get_page_from_freelist() in case the first zone looked at has no free memory. deferred_grow_zone() is always returning true due to the following code snipplet: /* If the zone is empty somebody else may have cleared out the zone */ if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, first_deferred_pfn)) { pgdat->first_deferred_pfn = ULONG_MAX; pgdat_resize_unlock(pgdat, &flags); return true; } This in turn results in the loop as get_page_from_freelist() is assuming forward progress can be made by doing some more struct page initialization. Link: http://lkml.kernel.org/r/20190620160821.4210-1-jgross@suse.com Fixes: 0e56acae4b4d ("mm: initialize MAX_ORDER_NR_PAGES at a time instead of doing larger sections") Signed-off-by: NJuergen Gross <jgross@suse.com> Suggested-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 07 4月, 2020 2 次提交
-
-
由 Tianjia Zhang 提交于
fix #26344811 commit 5780b9abd530982c2bb1018e2c52c05ab3c30b45 upstream sm3 has been supported by the ima hash algorithm, but it is not yet in the Kconfig configuration list. After adding, both ima and tpm2 can support sm3 well. Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: NMimi Zohar <zohar@linux.ibm.com> Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Tianjia Zhang 提交于
fix #26344811 commit 6a30e1b1dcad0ba94fae757f797812d7d8dcb72c upstream The name sm3-256 is defined in hash_algo_name in hash_info, but the algorithm name implemented in sm3_generic.c is sm3, which will cause the sm3-256 algorithm to be not found in some application scenarios of the hash algorithm, and an ENOENT error will occur. For example, IMA, keys, and other subsystems that reference hash_algo_name all use the hash algorithm of sm3. Fixes: 5ca4c20c ("keys, trusted: select hash algorithm for TPM2 chips") Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: NPascal van Leeuwen <pvanleeuwen@rambus.com> Signed-off-by: NMimi Zohar <zohar@linux.ibm.com> Signed-off-by: NTianjia Zhang <tianjia.zhang@linux.alibaba.com> Reviewed-by: Jia Zhang <zhang.jia@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
- 02 4月, 2020 5 次提交
-
-
由 Eric Biggers 提交于
fix #25967152 commit ca4463bf8438b403596edd0ec961ca0d4fbe0220 upstream The VT_DISALLOCATE ioctl can free a virtual console while tty_release() is still running, causing a use-after-free in con_shutdown(). This occurs because VT_DISALLOCATE considers a virtual console's 'struct vc_data' to be unused as soon as the corresponding tty's refcount hits 0. But actually it may be still being closed. Fix this by making vc_data be reference-counted via the embedded 'struct tty_port'. A newly allocated virtual console has refcount 1. Opening it for the first time increments the refcount to 2. Closing it for the last time decrements the refcount (in tty_operations::cleanup() so that it happens late enough), as does VT_DISALLOCATE. Reproducer: #include <fcntl.h> #include <linux/vt.h> #include <sys/ioctl.h> #include <unistd.h> int main() { if (fork()) { for (;;) close(open("/dev/tty5", O_RDWR)); } else { int fd = open("/dev/tty10", O_RDWR); for (;;) ioctl(fd, VT_DISALLOCATE, 5); } } KASAN report: BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129 CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 Call Trace: [...] con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278 release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514 tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629 tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789 [...] Allocated by task 129: [...] kzalloc include/linux/slab.h:669 [inline] vc_allocate drivers/tty/vt/vt.c:1085 [inline] vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066 con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229 tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline] tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline] tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035 [...] Freed by task 130: [...] kfree+0xbf/0x1e0 mm/slab.c:3757 vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline] vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660 [...] Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle") Cc: <stable@vger.kernel.org> # v3.4+ Reported-by: syzbot+522643ab5729b0421998@syzkaller.appspotmail.com Acked-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NEric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200322034305.210082-2-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Eric Biggers 提交于
fix #25967152 commit 7cf64b18b0b96e751178b8d0505d8466ff5a448f upstream vt_in_use() dereferences console_driver->ttys[i] without proper locking. This is broken because the tty can be closed and freed concurrently. We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL' and skipping the check of tty_struct::count. But, looking at console_driver->ttys[i] isn't really appropriate anyway because even if it is NULL the tty can still be in the process of being closed. Instead, fix it by making vt_in_use() require console_lock() and check whether the vt is allocated and has port refcount > 1. This works since following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console" the port refcount is incremented while the vt is open. Reproducer (very unreliable, but it worked for me after a few minutes): #include <fcntl.h> #include <linux/vt.h> int main() { int fd, nproc; struct vt_stat state; char ttyname[16]; fd = open("/dev/tty10", O_RDONLY); for (nproc = 1; nproc < 8; nproc *= 2) fork(); for (;;) { sprintf(ttyname, "/dev/tty%d", rand() % 8); close(open(ttyname, O_RDONLY)); ioctl(fd, VT_GETSTATE, &state); } } KASAN report: BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline] BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657 Read of size 4 at addr ffff888065722468 by task syz-vt2/132 CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014 Call Trace: [...] vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline] vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657 tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660 [...] Allocated by task 136: [...] kzalloc include/linux/slab.h:669 [inline] alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982 tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334 tty_open_by_driver drivers/tty/tty_io.c:1987 [inline] tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035 [...] Freed by task 41: [...] kfree+0xbf/0x200 mm/slab.c:3757 free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177 release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468 process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264 worker_thread+0x8b/0xc80 kernel/workqueue.c:2410 [...] Fixes: 4001d7b7 ("vt: push down the tty lock so we can see what is left to tackle") Cc: <stable@vger.kernel.org> # v3.4+ Acked-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NEric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jiri Slaby 提交于
fix #25967152 commit e587e8f17433ddb26954f0edf5b2f95c42155ae9 upstream These two were macros. Switch them to static inlines, so that it's more understandable what they are doing. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-2-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jiri Slaby 提交于
fix #25967152 commit dce05aa6eec977f1472abed95ccd71276b9a3864 upstream Avoid global variables (namely sel_cons) by introducing vc_is_sel. It checks whether the parameter is the current selection console. This will help putting sel_cons to a struct later. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Link: https://lore.kernel.org/r/20200219073951.16151-1-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #26374723 commit 0b8c0ec7eedcd8f9f1a1f238d87f9b512b09e71a upstream. syzbot reports: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 9217 Comm: io_uring-sq Not tainted 5.4.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: io_sq_thread+0x1c7/0xa20 fs/io_uring.c:3274 kthread+0x361/0x430 kernel/kthread.c:255 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:352 Modules linked in: ---[ end trace f2e1a4307fbe2245 ]--- RIP: 0010:creds_are_invalid kernel/cred.c:792 [inline] RIP: 0010:__validate_creds include/linux/cred.h:187 [inline] RIP: 0010:override_creds+0x9f/0x170 kernel/cred.c:550 Code: ac 25 00 81 fb 64 65 73 43 0f 85 a3 37 00 00 e8 17 ab 25 00 49 8d 7c 24 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 03 0f 8e 96 00 00 00 41 8b 5c 24 10 bf RSP: 0018:ffff88809c45fda0 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000043736564 RCX: ffffffff814f3318 RDX: 0000000000000002 RSI: ffffffff814f3329 RDI: 0000000000000010 RBP: ffff88809c45fdb8 R08: ffff8880a3aac240 R09: ffffed1014755849 R10: ffffed1014755848 R11: ffff8880a3aac247 R12: 0000000000000000 R13: ffff888098ab1600 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007ffd51c40664 CR3: 0000000092641000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 which is caused by slab fault injection triggering a failure in prepare_creds(). We don't actually need to create a copy of the creds as we're not modifying it, we just need a reference on the current task creds. This avoids the failure case as well, and propagates the const throughout the stack. Fixes: 181e448d8709 ("io_uring: async workers should inherit the user creds") Reported-by: syzbot+5320383e16029ba057ff@syzkaller.appspotmail.com Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
- 26 3月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
to #25570445 commit c4a2ed72c9a61594b6afc23e1fbc78878d32b5a3 upstream. We return -EBUSY on submit when we have a CQ ring overflow backlog, but that can be a bit problematic if the application is using pure userspace poll of the CQ ring. For that case, if the ring briefly overflowed and we have pending entries in the backlog, the submit flushes the backlog successfully but still returns -EBUSY. If we're able to fully flush the CQ ring backlog, let the submission proceed. Reported-by: NDan Melnic <dmm@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 25 3月, 2020 5 次提交
-
-
由 shanghui.lsh 提交于
fix #26111716 When doing VF probing, it will fail when calling pci_enable_sriov() function, as we should only call this function for physfn. Fix this issue by adding physfn check before the call. Fixes: 4a5d2b59 ("alinux: pci/iohub-sriov: Support for Alibaba PCIe IOHub SRIOV") Signed-off-by: Nshanghui.lsh <shanghui.lsh@alibaba-inc.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com> [ caspar: modify the commit log] Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Stephen Rothwell 提交于
fix #25924252 commit 7dcddef6f769d7e60691c732eb6d09cdb1d9df76 upstream An x86_64 allmodconfig build produces these errors: x86_64-linux-gnu-ld: kernel/sched/core.o: in function `cpuidle_poll_time': core.c:(.text+0x230): multiple definition of `cpuidle_poll_time'; arch/x86/= kernel/process.o:process.c:(.text+0xc0): first defined here (and more) Fixes: 259231a04561 ("cpuidle: add poll_limit_ns to cpuidle_device structure") Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xu Yu 提交于
fix #25926771 Specifically, replace `val / 1000000` with `val >> 20` to do the optimization. This also fixes the possible compiling error when building with ARCH=i386, which reports undefined reference to `__udivdi3`. Fixes: 40969475 ("alinux: mm, memcg: record latency of memcg wmark reclaim") Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Yi Wang 提交于
fix #25924484 commit 9481b7f10c5a7f149048310c25510f0386eb6631 upstream. This fixes the following coccinelle warning: WARNING: return of 0/1 in function 'vmx_need_emulation_on_page_fault' with return type bool Return false instead of 0. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
fix #25369772 In blk-mq device, we observed a issue that though iops is low, but iostat shows a very high svctm & util value, which is counter-intuitive. The root cause is that blk_account_io_start() calls part_round_stats() before "rq->part = part" statement, so part_round_stats() will count an inflight request to the whole device, but not for the specific partition, then it'll update whole device's io_ticks and time_in_queue with a stale part->stamp. To fix this issue, if a request's part is NULL, we just don't count it as an inflight request to the whole device. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 20 3月, 2020 2 次提交
-
-
由 Al Viro 提交于
commit 6404674acd596de41fd3ad5f267b4525494a891a upstream Brown paperbag time: fetching ->i_uid/->i_mode really should've been done from nd->inode. I even suggested that, but the reason for that has slipped through the cracks and I went for dir->d_inode instead - made for more "obvious" patch. Analysis: - at the entry into do_last() and all the way to step_into(): dir (aka nd->path.dentry) is known not to have been freed; so's nd->inode and it's equal to dir->d_inode unless we are already doomed to -ECHILD. inode of the file to get opened is not known. - after step_into(): inode of the file to get opened is known; dir might be pointing to freed memory/be negative/etc. - at the call of may_create_in_sticky(): guaranteed to be out of RCU mode; inode of the file to get opened is known and pinned; dir might be garbage. The last was the reason for the original patch. Except that at the do_last() entry we can be in RCU mode and it is possible that nd->path.dentry->d_inode has already changed under us. In that case we are going to fail with -ECHILD, but we need to be careful; nd->inode is pointing to valid struct inode and it's the same as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we should use that. Reported-by: N"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com> Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@kernel.org Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late") Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit b60fda6000a99a7ccac36005ab78b14b47c06de3 upstream We currently have a race where if setup is really slow, we can be calling io_wq_destroy() before we're done setting up. This will cause the caller to get stuck waiting for the manager to set things up, but the manager already exited. Fix this by doing a sync setup of the manager. This also fixes the case where if we failed creating workers, we'd also get stuck. In practice this race window was really small, as we already wait for the manager to start. Hence someone would have to call io_wq_destroy() after the task has started, but before it started the first loop. The reported test case forked tons of these, which is why it became an issue. Reported-by: syzbot+0f1cc17f85154f400465@syzkaller.appspotmail.com Fixes: 771b53d033e8 ("io-wq: small threadpool implementation for io_uring") Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 19 3月, 2020 3 次提交
-
-
由 Jens Axboe 提交于
commit 576a347b7af8abfbddc80783fb6629c2894d036e upstream. If we don't inherit the original task creds, then we can confuse users like fuse that pass creds in the request header. See link below on identical aio issue. Link: https://lore.kernel.org/linux-fsdevel/26f0d78e-99ca-2f1b-78b9-433088053a61@scylladb.com/T/#uSigned-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 576a347b7af8abfbddc80783fb6629c2894d036e upstream. We currently pass in 4 arguments outside of the bounded size. In preparation for adding one more argument, let's bundle them up in a struct to make it more readable. No functional changes in this patch. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
由 Jens Axboe 提交于
commit 7d7230652e7c788ef908536fd79f4cca077f269f upstream. For cancellation, we need to ensure that the work item stays valid for as long as ->cur_work is valid. Right now we can't safely dereference the work item even under the wqe->lock, because while the ->cur_work pointer will remain valid, the work could be completing and be freed in parallel. Only invoke ->get/put_work() on items we know that the caller queued themselves. Add IO_WQ_WORK_INTERNAL for io-wq to use, which is needed when we're queueing a flush item, for instance. Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Acked-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
-
- 18 3月, 2020 21 次提交
-
-
由 YueHaibing 提交于
commit 1d3ff0950e2b40dc861b1739029649d03f591820 upstream. [ Fixes: CVE-2019-20096 ] If dccp_feat_push_change fails, we forget free the mem which is alloced by kmemdup in dccp_feat_clone_sp_val. Reported-by: NHulk Robot <hulkci@huawei.com> Fixes: e8ef967a ("dccp: Registration routines for changing feature values") Reviewed-by: NMukesh Ojha <mojha@codeaurora.org> Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Jason Yan 提交于
commit f70267f379b5e5e11bdc5d72a56bf17e5feed01f upstream. [ Fixes: CVE-2019-19965 ] The discovering of sas port is driven by workqueue in libsas. When libsas is processing port events or phy events in workqueue, new events may rise up and change the state of some structures such as asd_sas_phy. This may cause some problems such as follows: ==>thread 1 ==>thread 2 ==>phy up ==>phy_up_v3_hw() ==>oob_mode = SATA_OOB_MODE; ==>phy down quickly ==>hisi_sas_phy_down() ==>sas_ha->notify_phy_event() ==>sas_phy_disconnected() ==>oob_mode = OOB_NOT_CONNECTED ==>workqueue wakeup ==>sas_form_port() ==>sas_discover_domain() ==>sas_get_port_device() ==>oob_mode is OOB_NOT_CONNECTED and device is wrongly taken as expander This at last lead to the panic when libsas trying to issue a command to discover the device. [183047.614035] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [183047.622896] Mem abort info: [183047.625762] ESR = 0x96000004 [183047.628893] Exception class = DABT (current EL), IL = 32 bits [183047.634888] SET = 0, FnV = 0 [183047.638015] EA = 0, S1PTW = 0 [183047.641232] Data abort info: [183047.644189] ISV = 0, ISS = 0x00000004 [183047.648100] CM = 0, WnR = 0 [183047.651145] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000b7df67be [183047.657834] [0000000000000058] pgd=0000000000000000 [183047.662789] Internal error: Oops: 96000004 [#1] SMP [183047.667740] Process kworker/u16:2 (pid: 31291, stack limit = 0x00000000417c4974) [183047.675208] CPU: 0 PID: 3291 Comm: kworker/u16:2 Tainted: G W OE 4.19.36-vhulk1907.1.0.h410.eulerosv2r8.aarch64 #1 [183047.687015] Hardware name: N/A N/A/Kunpeng Desktop Board D920S10, BIOS 0.15 10/22/2019 [183047.695007] Workqueue: 0000:74:02.0_disco_q sas_discover_domain [183047.700999] pstate: 20c00009 (nzCv daif +PAN +UAO) [183047.705864] pc : prep_ata_v3_hw+0xf8/0x230 [hisi_sas_v3_hw] [183047.711510] lr : prep_ata_v3_hw+0xb0/0x230 [hisi_sas_v3_hw] [183047.717153] sp : ffff00000f28ba60 [183047.720541] x29: ffff00000f28ba60 x28: ffff8026852d7228 [183047.725925] x27: ffff8027dba3e0a8 x26: ffff8027c05fc200 [183047.731310] x25: 0000000000000000 x24: ffff8026bafa8dc0 [183047.736695] x23: ffff8027c05fc218 x22: ffff8026852d7228 [183047.742079] x21: ffff80007c2f2940 x20: ffff8027c05fc200 [183047.747464] x19: 0000000000f80800 x18: 0000000000000010 [183047.752848] x17: 0000000000000000 x16: 0000000000000000 [183047.758232] x15: ffff000089a5a4ff x14: 0000000000000005 [183047.763617] x13: ffff000009a5a50e x12: ffff8026bafa1e20 [183047.769001] x11: ffff0000087453b8 x10: ffff00000f28b870 [183047.774385] x9 : 0000000000000000 x8 : ffff80007e58f9b0 [183047.779770] x7 : 0000000000000000 x6 : 000000000000003f [183047.785154] x5 : 0000000000000040 x4 : ffffffffffffffe0 [183047.790538] x3 : 00000000000000f8 x2 : 0000000002000007 [183047.795922] x1 : 0000000000000008 x0 : 0000000000000000 [183047.801307] Call trace: [183047.803827] prep_ata_v3_hw+0xf8/0x230 [hisi_sas_v3_hw] [183047.809127] hisi_sas_task_prep+0x750/0x888 [hisi_sas_main] [183047.814773] hisi_sas_task_exec.isra.7+0x88/0x1f0 [hisi_sas_main] [183047.820939] hisi_sas_queue_command+0x28/0x38 [hisi_sas_main] [183047.826757] smp_execute_task_sg+0xec/0x218 [183047.831013] smp_execute_task+0x74/0xa0 [183047.834921] sas_discover_expander.part.7+0x9c/0x5f8 [183047.839959] sas_discover_root_expander+0x90/0x160 [183047.844822] sas_discover_domain+0x1b8/0x1e8 [183047.849164] process_one_work+0x1b4/0x3f8 [183047.853246] worker_thread+0x54/0x470 [183047.856981] kthread+0x134/0x138 [183047.860283] ret_from_fork+0x10/0x18 [183047.863931] Code: f9407a80 528000e2 39409281 72a04002 (b9405800) [183047.870097] kernel fault(0x1) notification starting on CPU 0 [183047.875828] kernel fault(0x1) notification finished on CPU 0 [183047.881559] Modules linked in: unibsp(OE) hns3(OE) hclge(OE) hnae3(OE) mem_drv(OE) hisi_sas_v3_hw(OE) hisi_sas_main(OE) [183047.892418] ---[ end trace 4cc26083fc11b783 ]--- [183047.897107] Kernel panic - not syncing: Fatal exception [183047.902403] kernel fault(0x5) notification starting on CPU 0 [183047.908134] kernel fault(0x5) notification finished on CPU 0 [183047.913865] SMP: stopping secondary CPUs [183047.917861] Kernel Offset: disabled [183047.921422] CPU features: 0x2,a2a00a38 [183047.925243] Memory Limit: none [183047.928372] kernel reboot(0x2) notification starting on CPU 0 [183047.934190] kernel reboot(0x2) notification finished on CPU 0 [183047.940008] ---[ end Kernel panic - not syncing: Fatal exception ]--- Fixes: 2908d778 ("[SCSI] aic94xx: new driver") Link: https://lore.kernel.org/r/20191206011118.46909-1-yanaijie@huawei.comReported-by: NGao Chuan <gaochuan4@huawei.com> Reviewed-by: NJohn Garry <john.garry@huawei.com> Signed-off-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Akeem G Abodunrin 提交于
commit bc8a76a152c5f9ef3b48104154a65a68a8b76946 upstream. [ Fixes: CVE-2019-14615 ] Intel ID: PSIRT-TA-201910-001 CVEID: CVE-2019-14615 Intel GPU Hardware prior to Gen11 does not clear EU state during a context switch. This can result in information leakage between contexts. For Gen8 and Gen9, hardware provides a mechanism for fast cleardown of the EU state, by issuing a PIPE_CONTROL with bit 27 set. We can use this in a context batch buffer to explicitly cleardown the state on every context switch. As this workaround is already in place for gen8, we can borrow the code verbatim for Gen9. Signed-off-by: NMika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: NAkeem G Abodunrin <akeem.g.abodunrin@intel.com> Cc: Kumar Valsan Prathap <prathap.kumar.valsan@intel.com> Cc: Chris Wilson <chris.p.wilson@intel.com> Cc: Balestrieri Francesco <francesco.balestrieri@intel.com> Cc: Bloomfield Jon <jon.bloomfield@intel.com> Cc: Dutt Sudeep <sudeep.dutt@intel.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Navid Emamdoost 提交于
commit 4a9d46a9fe14401f21df69cea97c62396d5fb053 upstream. [ Fixes: CVE-2019-19077 ] In bnxt_re_create_srq(), when ib_copy_to_udata() fails allocated memory should be released by goto fail. Fixes: 37cb11ac ("RDMA/bnxt_re: Add SRQ support for Broadcom adapters") Link: https://lore.kernel.org/r/20190910222120.16517-1-navid.emamdoost@gmail.comSigned-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NJason Gunthorpe <jgg@mellanox.com> Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Navid Emamdoost 提交于
commit 4aa7afb0ee20a97fbf0c5bab3df028d5fb85fdab upstream. [ Fixes: CVE-2019-19046 ] In the impelementation of __ipmi_bmc_register() the allocated memory for bmc should be released in case ida_simple_get() fails. Fixes: 68e7e50f ("ipmi: Don't use BMC product/dev ids in the BMC name") Signed-off-by: NNavid Emamdoost <navid.emamdoost@gmail.com> Message-Id: <20191021200649.1511-1-navid.emamdoost@gmail.com> Signed-off-by: NCorey Minyard <cminyard@mvista.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Jiri Slaby 提交于
commit 07e6124a1a46b4b5a9b3cacc0c306b50da87abf5 upstream. [ Fixes: CVE-2020-8648 ] syzkaller reported this UAF: BUG: KASAN: use-after-free in n_tty_receive_buf_common+0x2481/0x2940 drivers/tty/n_tty.c:1741 Read of size 1 at addr ffff8880089e40e9 by task syz-executor.1/13184 CPU: 0 PID: 13184 Comm: syz-executor.1 Not tainted 5.4.7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: ... kasan_report+0xe/0x20 mm/kasan/common.c:634 n_tty_receive_buf_common+0x2481/0x2940 drivers/tty/n_tty.c:1741 tty_ldisc_receive_buf+0xac/0x190 drivers/tty/tty_buffer.c:461 paste_selection+0x297/0x400 drivers/tty/vt/selection.c:372 tioclinux+0x20d/0x4e0 drivers/tty/vt/vt.c:3044 vt_ioctl+0x1bcf/0x28d0 drivers/tty/vt/vt_ioctl.c:364 tty_ioctl+0x525/0x15a0 drivers/tty/tty_io.c:2657 vfs_ioctl fs/ioctl.c:47 [inline] It is due to a race between parallel paste_selection (TIOCL_PASTESEL) and set_selection_user (TIOCL_SETSEL) invocations. One uses sel_buffer, while the other frees it and reallocates a new one for another selection. Add a mutex to close this race. The mutex takes care properly of sel_buffer and sel_buffer_lth only. The other selection global variables (like sel_start, sel_end, and sel_cons) are protected only in set_selection_user. The other functions need quite some more work to close the races of the variables there. This is going to happen later. This likely fixes (I am unsure as there is no reproducer provided) bug 206361 too. It was marked as CVE-2020-8648. Signed-off-by: NJiri Slaby <jslaby@suse.cz> Reported-by: syzbot+59997e8d5cbdc486e6f6@syzkaller.appspotmail.com References: https://bugzilla.kernel.org/show_bug.cgi?id=206361 Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200210081131.23572-2-jslaby@suse.czSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Zhang Xiaoxu 提交于
commit 513dc792d6060d5ef572e43852683097a8420f56 upstream. [ Fixes: CVE-2020-8647, CVE-2020-8649 ] When syzkaller tests, there is a UAF: BUG: KASan: use after free in vgacon_invert_region+0x9d/0x110 at addr ffff880000100000 Read of size 2 by task syz-executor.1/16489 page:ffffea0000004000 count:0 mapcount:-127 mapping: (null) index:0x0 page flags: 0xfffff00000000() page dumped because: kasan: bad access detected CPU: 1 PID: 16489 Comm: syz-executor.1 Not tainted Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: [<ffffffffb119f309>] dump_stack+0x1e/0x20 [<ffffffffb04af957>] kasan_report+0x577/0x950 [<ffffffffb04ae652>] __asan_load2+0x62/0x80 [<ffffffffb090f26d>] vgacon_invert_region+0x9d/0x110 [<ffffffffb0a39d95>] invert_screen+0xe5/0x470 [<ffffffffb0a21dcb>] set_selection+0x44b/0x12f0 [<ffffffffb0a3bfae>] tioclinux+0xee/0x490 [<ffffffffb0a1d114>] vt_ioctl+0xff4/0x2670 [<ffffffffb0a0089a>] tty_ioctl+0x46a/0x1a10 [<ffffffffb052db3d>] do_vfs_ioctl+0x5bd/0xc40 [<ffffffffb052e2f2>] SyS_ioctl+0x132/0x170 [<ffffffffb11c9b1b>] system_call_fastpath+0x22/0x27 Memory state around the buggy address: ffff8800000fff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800000fff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff880000100000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff It can be reproduce in the linux mainline by the program: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/ioctl.h> #include <linux/vt.h> struct tiocl_selection { unsigned short xs; /* X start */ unsigned short ys; /* Y start */ unsigned short xe; /* X end */ unsigned short ye; /* Y end */ unsigned short sel_mode; /* selection mode */ }; #define TIOCL_SETSEL 2 struct tiocl { unsigned char type; unsigned char pad; struct tiocl_selection sel; }; int main() { int fd = 0; const char *dev = "/dev/char/4:1"; struct vt_consize v = {0}; struct tiocl tioc = {0}; fd = open(dev, O_RDWR, 0); v.v_rows = 3346; ioctl(fd, VT_RESIZEX, &v); tioc.type = TIOCL_SETSEL; ioctl(fd, TIOCLINUX, &tioc); return 0; } When resize the screen, update the 'vc->vc_size_row' to the new_row_size, but when 'set_origin' in 'vgacon_set_origin', vgacon use 'vga_vram_base' for 'vc_origin' and 'vc_visible_origin', not 'vc_screenbuf'. It maybe smaller than 'vc_screenbuf'. When TIOCLINUX, use the new_row_size to calc the offset, it maybe larger than the vga_vram_size in vgacon driver, then bad access. Also, if set an larger screenbuf firstly, then set an more larger screenbuf, when copy old_origin to new_origin, a bad access may happen. So, If the screen size larger than vga_vram, resize screen should be failed. This alse fix CVE-2020-8649 and CVE-2020-8647. Linus pointed out that overflow checking seems absent. We're saved by the existing bounds checks in vc_do_resize() with rather strict limits: if (cols > VC_RESIZE_MAXCOL || lines > VC_RESIZE_MAXROW) return -EINVAL; Fixes: 0aec4867 ("[PATCH] SVGATextMode fix") Reference: CVE-2020-8647 and CVE-2020-8649 Reported-by: NHulk Robot <hulkci@huawei.com> Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> [danvet: augment commit message to point out overflow safety] Cc: stable@vger.kernel.org Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20200304022429.37738-1-zhangxiaoxu5@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Al Viro 提交于
commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream. [ Fixes: CVE-2020-8428 ] may_create_in_sticky() call is done when we already have dropped the reference to dir. Fixes: 30aba665 (namei: allow restricted O_CREAT of FIFOs and regular files) Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Boris Ostrovsky 提交于
commit 8c6de56a42e0c657955e12b882a81ef07d1d073e upstream. [ Fixes: CVE-2019-3016 ] kvm_steal_time_set_preempted() may accidentally clear KVM_VCPU_FLUSH_TLB bit if it is called more than once while VCPU is preempted. This is part of CVE-2019-3016. (This bug was also independently discovered by Jim Mattson <jmattson@google.com>) Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Oliver Upton 提交于
commit 35a571346a94fb93b5b3b6a599675ef3384bc75c upstream. [ Fixes: CVE-2020-2732 ] Consult the 'unconditional IO exiting' and 'use IO bitmaps' VM-execution controls when checking instruction interception. If the 'use IO bitmaps' VM-execution control is 1, check the instruction access against the IO bitmaps to determine if the instruction causes a VM-exit. Signed-off-by: NOliver Upton <oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Oliver Upton 提交于
commit e71237d3ff1abf9f3388337cfebf53b96df2020d upstream. [ Fixes: CVE-2020-2732 ] Checks against the IO bitmap are useful for both instruction emulation and VM-exit reflection. Refactor the IO bitmap checks into a helper function. Signed-off-by: NOliver Upton <oupton@google.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Paolo Bonzini 提交于
commit 07721feee46b4b248402133228235318199b05ec upstream. [ Fixes: CVE-2020-2732 ] vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: stable@vger.kernel.org [Made commit, added commit msg - Oliver] Signed-off-by: NOliver Upton <oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Shile Zhang 提交于
commit 07447453db3aebb6a0917592f411a7122d12a8b9 upstream linux-next. When 'CONFIG_DEFERRED_STRUCT_PAGE_INIT' is set, 'pgdatinit' kthread will initialise the deferred pages with local interrupts disabled. It is introduced by commit 3a2d7fa8 ("mm: disable interrupts while initializing deferred pages"). On machine with NCPUS <= 2, the 'pgdatinit' kthread could be bound to the boot CPU, which could caused the tick timer long time stall, system jiffies not be updated in time. The dmesg shown that: [ 0.197975] node 0 initialised, 32170688 pages in 1ms Obviously, 1ms is unreasonable. Now, fix it by restore in the pending interrupts for every 32*1204 pages (128MB) initialized, give the chance to update the systemd jiffies. The reasonable demsg shown likes: [ 1.069306] node 0 initialised, 32203456 pages in 894ms Link: http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com Fixes: 3a2d7fa8 ("mm: disable interrupts while initializing deferred pages") Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Co-developed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Xu Yu 提交于
This exports the workingset counters, i.e., workingset_refault, workingset_activate, workingset_restore, and workingset_nodereclaim, to memory cgroup v1. The stat collection of these counters is shared between memory cgroup v1 and v2. What this patch does is just to export them on memory cgroup v1. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Lingpeng Chen 提交于
commit e7a5f1f1cd0008e5ad379270a8657e121eedb669 upstream Right now in tcp_bpf_recvmsg, sock read data first from sk_receive_queue if not empty than psock->ingress_msg otherwise. If a FIN packet arrives and there's also some data in psock->ingress_msg, the data in psock->ingress_msg will be purged. It is always happen when request to a HTTP1.0 server like python SimpleHTTPServer since the server send FIN packet after data is sent out. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: NArika Chen <eaglesora@gmail.com> Suggested-by: NArika Chen <eaglesora@gmail.com> Signed-off-by: NLingpeng Chen <forrest0579@gmail.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200109014833.18951-1-forrest0579@gmail.com [tonylu: patched modified to match BIG rework between v4.19 and upstream] Signed-off-by: NTony Lu <tonylu@linux.alibaba.com> Acked-by: NDust Li <dust.li@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 liushanghui 提交于
It enables SRIOV of PCIe devices of Alibaba MOC, then VFs can be used by other host or VM above on the host. Signed-off-by: Nliushanghui <liushanghui@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Xu Yu 提交于
Explicitly abort mem_cgroup_select_bad_process in priority oom if there is already a task as oom victim without MMF_OOM_SKIP flag set. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Xu Yu 提交于
Since commit e0205ae40f12 ("mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks()") made mem_cgroup_scan_tasks() to check only one thread from each thread group, we can make cgroup_subsys_state::nr_tasks to record only the thread group leader, i.e., process, instead of thread(s). Furthermore, this renames cgroup_subsys_state::nr_tasks to cgroup_subsys_state::nr_procs. Fixes: f061cd88 ("alinux: kernel: cgroup: account number of tasks in the css and its descendants") Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Tetsuo Handa 提交于
commit f168a9a54ec39b3f832c353733898b713b6b5c1f upstream. Since commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works, mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check only one thread from each thread group. [penguin-kernel@I-love.SAKURA.ne.jp: remove thread group leader check in oom_evaluate_task()] Link: http://lkml.kernel.org/r/1560853257-14934-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Link: http://lkml.kernel.org/r/c763afc8-f0ae-756a-56a7-395f625b95fc@i-love.sakura.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xu Yu 提交于
Assuming that there is a memory cgroup tree as follows: A (use_priority_oom=1, limit=2.5G) / \ / C (priority=3, usage=1.5G) B (priority=0, usage=1G) As task in C (task-c) invokes oom-killer, task in B (task-b) is chosen and killed, and then task-c returns from mem_cgroup_oom and retries in try_charge. If memory page_counter of B has not been reset yet, leading to task-c invokes oom-killer again, the soft lockup may happen. In this situation, task-c keeps selecting bad process in B, while the only task-b in B has already been set PF_EXITING flag, which makes task-b skipped in css_task_iter_advance. Finally, task-c selected no bad process in B and keeps retrying, and task-b is stalled in synchronize_rcu when do_exit, exit_task_namespaces specifically. In a nutshell, the new behavior of css_task_iter_advance, i.e., commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations"), causes priority oom to misbehave. This fixes the soft lockup by accounting num_oom_skip of the victim memcg and its parents (sift up to oc->memcg), if no bad process is chosen from it. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xiaoguang Wang 提交于
commit 32b2244a840a90ea94ba42392de5c48d53f521f5 upstream linux-next When SETUP_IOPOLL and SETUP_SQPOLL are both enabled, applications don't need to do io completion events polling again, they can rely on io_sq_thread to do polling work, which can reduce cpu usage and uring_lock contention. I modify fio io_uring engine codes a bit to evaluate the performance: static int fio_ioring_getevents(struct thread_data *td, unsigned int min, continue; } - if (!o->sqpoll_thread) { + if (o->sqpoll_thread && o->hipri) { r = io_uring_enter(ld, 0, actual_min, IORING_ENTER_GETEVENTS); if (r < 0) { and use "fio -name=fiotest -filename=/dev/nvme0n1 -iodepth=$depth -thread -rw=read -ioengine=io_uring -hipri=1 -sqthread_poll=1 -direct=1 -bs=4k -size=10G -numjobs=1 -time_based -runtime=120" original codes -------------------------------------------------------------------- iodepth | 4 | 8 | 16 | 32 | 64 bw | 1133MB/s | 1519MB/s | 2090MB/s | 2710MB/s | 3012MB/s fio cpu usage | 100% | 100% | 100% | 100% | 100% -------------------------------------------------------------------- with patch -------------------------------------------------------------------- iodepth | 4 | 8 | 16 | 32 | 64 bw | 1196MB/s | 1721MB/s | 2351MB/s | 2977MB/s | 3357MB/s fio cpu usage | 63.8% | 74.4%% | 81.1% | 83.7% | 82.4% -------------------------------------------------------------------- bw improve | 5.5% | 13.2% | 12.3% | 9.8% | 11.5% -------------------------------------------------------------------- From above test results, we can see that bw has above 5.5%~13% improvement, and fio process's cpu usage also drops much. Note this won't improve io_sq_thread's cpu usage when SETUP_IOPOLL|SETUP_SQPOLL are both enabled, in this case, io_sq_thread always has 100% cpu usage. I think this patch will be friendly to applications which will often use io_uring_wait_cqe() or similar from liburing. Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-