- 01 6月, 2023 1 次提交
-
-
由 Hui Tang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I7A718 -------------------------------- There are worse performance with the 'Fixes' when running "./lat_ctx -P $SYNC_MAX -s 64 16". The 'Fixes' which allocates memory for p->prefer_cpus even if "prefer_cpus" not be set. Before the 'Fixes', only test "p->prefer_cpus", after, add test "!cpumask_empty(p->prefer_cpus)" which causing performance degradation. select_task_rq_fair ->set_task_select_cpus ->prefer_cpus_valid ---- test cpumask_empty(p->prefer_cpus) Fixes: ebeb84ad ("cpuset: Introduce new interface for scheduler ...") Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
- 18 5月, 2023 1 次提交
-
-
由 Tom_zc 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I6N4EP CVE: NA -------------------------------- openEuler supports bcache by default on x86 platforms, Solve the kabi change caused by opening bcache. Signed-off-by: NChao Zhu <tom_toworld@163.com>
-
- 08 4月, 2023 3 次提交
-
-
由 tanghui 提交于
hulk inclusion category: feature bugzilla: 186575, https://gitee.com/openeuler/kernel/issues/I526XC -------------------------------- Signed-off-by: Ntanghui <tanghui20@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
-
由 tanghui 提交于
hulk inclusion category: feature bugzilla: 186575, https://gitee.com/openeuler/kernel/issues/I526XC -------------------------------- Compare taskgroup 'util_avg' in perferred cpu with capacity preferred cpu, dynamicly adjust cpu range for task wakeup process. Signed-off-by: Ntanghui <tanghui20@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
-
由 tanghui 提交于
hulk inclusion category: feature bugzilla: 186575, https://gitee.com/openeuler/kernel/issues/I526XC -------------------------------- Add 'prefer_cpus' sysfs and related interface in cgroup cpuset. Signed-off-by: Ntanghui <tanghui20@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com>
-
- 28 2月, 2023 2 次提交
-
-
由 Li Lingfeng 提交于
Offering: HULK hulk inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6BTWC ------------------- Commit 5125c6de8709("[Backport] io_uring: import 5.15-stable io_uring") changes some structs, so we need to fix kabi broken problem. Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Jens Axboe 提交于
stable inclusion from stable-v5.10.162 commit 788d0824269bef539fe31a785b1517882eafed93 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6BTWC CVE: CVE-2023-0240 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v5.10.167&id=788d0824269bef539fe31a785b1517882eafed93 -------------------------------- No upstream commit exists. This imports the io_uring codebase from 5.15.85, wholesale. Changes from that code base: - Drop IOCB_ALLOC_CACHE, we don't have that in 5.10. - Drop MKDIRAT/SYMLINKAT/LINKAT. Would require further VFS backports, and we don't support these in 5.10 to begin with. - sock_from_file() old style calling convention. - Use compat_get_bitmap() only for CONFIG_COMPAT=y Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 02 12月, 2022 1 次提交
-
-
由 Hui Su 提交于
stable inclusion from stable-v5.10.140 commit c30c0f720533c6bcab2e16b90d302afb50a61d2a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c30c0f720533c6bcab2e16b90d302afb50a61d2a -------------------------------- commit 0e387249 upstream. since commit 2279f540 ("sched/deadline: Fix priority inheritance with multiple scheduling classes"), we should not keep it here. Signed-off-by: NHui Su <suhui_kernel@163.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com> Link: https://lore.kernel.org/r/20220107095254.GA49258@localhost.localdomainSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 29 11月, 2022 2 次提交
-
-
由 Chen Hui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KUFB CVE: NA -------------------------------- Add three hooks of sched type in select_task_rq_fair(), as follows: 'cfs_select_rq' Replace the original core selection policy or implement dynamic CPU affinity. 'cfs_select_rq_exit' Restoring the CPU affinity of the task before exiting of 'select_task_rq_fair'. To be used with 'cfs_select_rq' hook to implement dynamic CPU affinity. 'cfs_wake_affine' Determine on which CPU task can run soonest. Allow user to implement deferent policies. Signed-off-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
由 Chen Hui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KUFB CVE: NA -------------------------------- Add cpumask ops collection, such as cpumask_empty, cpumask_and, cpumask_andnot, cpumask_subset, cpumask_equal, cpumask_copy. Signed-off-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
- 25 11月, 2022 3 次提交
-
-
由 Chen Hui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KUFB CVE: NA -------------------------------- Add the helper functions to get cpu statistics, as follows: 1.acquire cfs/rt/irq cpu load statitic. 2.acquire multiple types of nr_running statitic. 3.acquire cpu idle statitic. 4.acquire cpu capacity. Based on CPU statistics in different dimensions, specific scheduling policies can be implemented in bpf program. Signed-off-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com> Signed-off-by: NRen Zhijie <renzhijie2@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
由 Chen Hui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KUFB CVE: NA -------------------------------- Add user interface of task group tag, bridges the information gap between user-mode and kernel-mode. Signed-off-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NRen Zhijie <renzhijie2@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
由 Chen Hui 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5KUFB CVE: NA -------------------------------- Add a tag for the task, useful to identify the special task. User can use the file system interface to mark different tags for specific workloads. The kernel subsystems can use the set_* helpers to mark it too. The bpf prog obtains the tags to detect different workloads. Signed-off-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NRen Zhijie <renzhijie2@huawei.com> Signed-off-by: NHui Tang <tanghui20@huawei.com>
-
- 24 11月, 2022 2 次提交
-
-
由 Xiaochen Shen 提交于
category: bugfix bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596WO CVE: NA Intel-SIG: sched: Fix kABI for task->pasid_activated. ------------------------------------------------- In commit ("sched: Define and initialize a flag to identify valid PASID in the task"), a new single bit field 'pasid_activated' is added to the middle of 'struct task_struct' that causes kABI breakage. Fix it by using KABI_FILL_HOLE() API for task->pasid_activated. Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.18 commit a3d29e82 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596WO CVE: NA Intel-SIG: commit a3d29e82 sched: Define and initialize a flag to identify valid PASID in the task. Incremental backporting patches for DSA/IAA on Intel Xeon platform. -------------------------------- Add a new single bit field to the task structure to track whether this task has initialized the IA32_PASID MSR to the mm's PASID. Initialize the field to zero when creating a new task with fork/clone. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Co-developed-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220207230254.3342514-8-fenghua.yu@intel.comSigned-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
-
- 21 11月, 2022 4 次提交
-
-
由 Ma Wupeng 提交于
mm: oom_kill: fix KABI broken by "oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup" hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA ------------------------------- Move oom_reaper_timer from task_struct to task_struct_resvd to fix KABI broken. Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: Nchenhui 00515652 <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Nico Pache 提交于
mainline inclusion from mainline-v5.18-rc4 commit e4a38402 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I61FDP CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e4a38402c36e42df28eb1a5394be87e6571fb48a -------------------------------- The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: NJoel Savitz <jsavitz@redhat.com> Signed-off-by: NNico Pache <npache@redhat.com> Co-developed-by: NJoel Savitz <jsavitz@redhat.com> Suggested-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Rafael Aquini <aquini@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Herton R. Krzesinski <herton@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Savitz <jsavitz@redhat.com> Cc: Darren Hart <dvhart@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: Nchenhui 00515652 <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zheng Zucheng 提交于
hulk inclusion category: feature bugzilla: 187196, https://gitee.com/openeuler/kernel/issues/I612CS ------------------------------- Allocate a new task_struct_resvd object for the recently cloned task Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: NNanyong Sun <sunnanyong@huawei.com> Reviewed-by: NZhang Qiao <zhangqiao22@huawei.com> Reviewed-by: Nchenhui 00515652 <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tong Tiangen 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5GB28 CVE: NA ------------------------------- In the dump_user_range() processing, the data of the user process is dump to corefile, when hardware memory error is encountered during dump, only the relevant processes are affected, so killing the user process and isolate the user page with hardware memory errors is a more reasonable choice than kernel panic. The dump_user_range() typical usage scenarios is coredump. Coredump file writing to fs is related to the specific implementation of fs's write_iter operation. This patch only supports two typical fs write function (_copy_from_iter/iov_iter_copy_from_user_atomic) which is used by ext4/tmpfs/pipefs. Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
-
- 18 11月, 2022 1 次提交
-
-
由 Waiman Long 提交于
stable inclusion from stable-v5.10.137 commit 336626564b58071b8980a4e6a31a8f5d92705d9b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=336626564b58071b8980a4e6a31a8f5d92705d9b -------------------------------- [ Upstream commit b6e8d40d ] With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating that the cpuset will just use the effective CPUs of its parent. So cpuset_can_attach() can call task_can_attach() with an empty mask. This can lead to cpumask_any_and() returns nr_cpu_ids causing the call to dl_bw_of() to crash due to percpu value access of an out of bound CPU value. For example: [80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0 : [80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0 : [80468.207946] Call Trace: [80468.208947] cpuset_can_attach+0xa0/0x140 [80468.209953] cgroup_migrate_execute+0x8c/0x490 [80468.210931] cgroup_update_dfl_csses+0x254/0x270 [80468.211898] cgroup_subtree_control_write+0x322/0x400 [80468.212854] kernfs_fop_write_iter+0x11c/0x1b0 [80468.213777] new_sync_write+0x11f/0x1b0 [80468.214689] vfs_write+0x1eb/0x280 [80468.215592] ksys_write+0x5f/0xe0 [80468.216463] do_syscall_64+0x5c/0x80 [80468.224287] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix that by using effective_cpus instead. For cgroup v1, effective_cpus is the same as cpus_allowed. For v2, effective_cpus is the real cpumask to be used by tasks within the cpuset anyway. Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to reflect the change. In addition, a check is added to task_can_attach() to guard against the possibility that cpumask_any_and() may return a value >= nr_cpu_ids. Fixes: 7f51412a ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets") Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Acked-by: NJuri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 30 9月, 2022 6 次提交
-
-
由 Lin Shengwang 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA -------------------------------------------------------------------------- Signed-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chris Hyser 提交于
mainline inclusion from mainline-v5.14-rc1 commit 7ac592aa category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7ac592aa35a684ff1858fb9ec282886b9e3575ac -------------------------------------------------------------------------- This patch provides support for setting and copying core scheduling 'task cookies' between threads (PID), processes (TGID), and process groups (PGID). The value of core scheduling isn't that tasks don't share a core, 'nosmt' can do that. The value lies in exploiting all the sharing opportunities that exist to recover possible lost performance and that requires a degree of flexibility in the API. From a security perspective (and there are others), the thread, process and process group distinction is an existent hierarchal categorization of tasks that reflects many of the security concerns about 'data sharing'. For example, protecting against cache-snooping by a thread that can just read the memory directly isn't all that useful. With this in mind, subcommands to CREATE/SHARE (TO/FROM) provide a mechanism to create and share cookies. CREATE/SHARE_TO specify a target pid with enum pidtype used to specify the scope of the targeted tasks. For example, PIDTYPE_TGID will share the cookie with the process and all of it's threads as typically desired in a security scenario. API: prctl(PR_SCHED_CORE, PR_SCHED_CORE_GET, tgtpid, pidtype, &cookie) prctl(PR_SCHED_CORE, PR_SCHED_CORE_CREATE, tgtpid, pidtype, NULL) prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_TO, tgtpid, pidtype, NULL) prctl(PR_SCHED_CORE, PR_SCHED_CORE_SHARE_FROM, srcpid, pidtype, NULL) where 'tgtpid/srcpid == 0' implies the current process and pidtype is kernel enum pid_type {PIDTYPE_PID, PIDTYPE_TGID, PIDTYPE_PGID, ...}. For return values, EINVAL, ENOMEM are what they say. ESRCH means the tgtpid/srcpid was not found. EPERM indicates lack of PTRACE permission access to tgtpid/srcpid. ENODEV indicates your machines lacks SMT. [peterz: complete rewrite] Signed-off-by: NChris Hyser <chris.hyser@oracle.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDon Hiatt <dhiatt@digitalocean.com> Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123309.039845339@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: Nlihua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.14-rc1 commit 85dd3f61 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=85dd3f61203c5cfa72b308ff327b5fbf3fc1ce5e -------------------------------------------------------------------------- Note that sched_core_fork() is called from under tasklist_lock, and not from sched_fork() earlier. This avoids a few races later. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDon Hiatt <dhiatt@digitalocean.com> Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.980003687@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: Nlihua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.14-rc1 commit 6e33cad0 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6e33cad0af49336952e5541464bd02f5b5fd433e -------------------------------------------------------------------------- In order to not have to use pid_struct, create a new, smaller, structure to manage task cookies for core scheduling. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDon Hiatt <dhiatt@digitalocean.com> Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.919768100@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: Nlihua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.14-rc1 commit d2dfa17b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2dfa17bc7de67e99685c4d6557837bf801a102c -------------------------------------------------------------------------- When a sibling is forced-idle to match the core-cookie; search for matching tasks to fill the core. rcu_read_unlock() can incur an infrequent deadlock in sched_core_balance(). Fix this by using the RCU-sched flavor instead. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDon Hiatt <dhiatt@digitalocean.com> Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.800048269@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: Nlihua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
mainline inclusion from mainline-v5.14-rc1 commit 8a311c74 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5OOWG CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8a311c740b53324ec584e0e3bb7077d56b123c28 -------------------------------------------------------------------------- Introduce task_struct::core_cookie as an opaque identifier for core scheduling. When enabled; core scheduling will only allow matching task to be on the core; where idle matches everything. When task_struct::core_cookie is set (and core scheduling is enabled) these tasks are indexed in a second RB-tree, first on cookie value then on scheduling function, such that matching task selection always finds the most elegible match. NOTE: *shudder* at the overhead... NOTE: *sigh*, a 3rd copy of the scheduling function; the alternative is per class tracking of cookies and that just duplicates a lot of stuff for no raisin (the 2nd copy lives in the rt-mutex PI code). [Joel: folded fixes] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NDon Hiatt <dhiatt@digitalocean.com> Tested-by: NHongyu Ning <hongyu.ning@linux.intel.com> Tested-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20210422123308.496975854@infradead.orgSigned-off-by: NLin Shengwang <linshengwang1@huawei.com> Reviewed-by: Nlihua <hucool.lihua@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 23 5月, 2022 1 次提交
-
-
由 Igor Pylypiv 提交于
stable inclusion from stable-v5.10.102 commit de55891e162cac0ae058e05c2527fd32cc435ac0 bugzilla: https://gitee.com/openeuler/kernel/issues/I567K6 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=de55891e162cac0ae058e05c2527fd32cc435ac0 -------------------------------- [ Upstream commit 67d6212a ] This reverts commit 774a1221. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags |= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit 21c3c5d2 ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit 774a1221 ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit 0fdff3ec ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: NIgor Pylypiv <ipylypiv@google.com> Reviewed-by: NChangyuan Lyu <changyuanl@google.com> Reviewed-by: NLuis Chamberlain <mcgrof@kernel.org> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 10 5月, 2022 2 次提交
-
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I52611 CVE: NA -------------------------------- We have added two statistics for qos smt expeller: a) nr_qos_smt_send_ipi:the times of ipi which online task expel offline tasks; b) nr_qos_smt_expelled:the statistics that offline task will not be picked times. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I52611 CVE: NA -------------------------------- We implement the function of qos smt expeller by this following two points: a)when online tasks and offline tasks are running on the same physical cpu, online tasks will send ipi to expel offline tasks on the smt sibling cpus. b)when online tasks are running, the smt sibling cpus will not allow offline tasks to be selected. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 23 2月, 2022 1 次提交
-
-
由 Peng Wu 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4PM01 CVE: NA ------------------------------------------ Adding reliable flag for user task. User task with reliable flag can only alloc memory from mirrored region. PF_RELIABLE is added to represent the task's reliable flag. - For init task, which is regarded as special task which alloc memory from mirrored region. - For normal user tasks, The reliable flag can be set via procfs interface shown as below and can be inherited via fork(). User can change a user task's reliable flag by $ echo [0/1] > /proc/<pid>/reliable and check a user task's reliable flag by $ cat /proc/<pid>/reliable Note, global init task's reliable file can not be accessed. Signed-off-by: NPeng Wu <wupeng58@huawei.com> Signed-off-by: NMa Wupeng <mawupeng1@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 29 1月, 2022 2 次提交
-
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4KAP1?from=project-issue CVE: NA ------------------------------- We reserve some fields beforehand for sched structures prone to change, therefore, we can hot add/change features of sched with this enhancement. After reserving, normally cache does not matter as the reserved fields are not accessed at all. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Ard Biesheuvel 提交于
mainline inclusion from mainline-v5.16-rc1 commit bcf9033e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4Q94A CVE: NA --------------------------- THREAD_INFO_IN_TASK moved the CPU field out of thread_info, but this causes some issues on architectures that define raw_smp_processor_id() in terms of this field, due to the fact that #include'ing linux/sched.h to get at struct task_struct is problematic in terms of circular dependencies. Given that thread_info and task_struct are the same data structure anyway when THREAD_INFO_IN_TASK=y, let's move it back so that having access to the type definition of struct thread_info is sufficient to reference the CPU number of the current task. Note that this requires THREAD_INFO_IN_TASK's definition of the task_thread_info() helper to be updated, as task_cpu() takes a pointer-to-const, whereas task_thread_info() (which is used to generate lvalues as well), needs a non-const pointer. So make it a macro instead. Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NGuan Jing <guanjing6@huawei.com> Conflicts: arch/arm64/kernel/head.S Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 31 12月, 2021 1 次提交
-
-
由 Guan Jing 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4KAP1?from=project-issue CVE: NA -------- We reserve some fields beforehand for sched structures prone to change, therefore, we can hot add/change features of sched with this enhancement. After reserving, normally cache does not matter as the reserved fields are not accessed at all. Signed-off-by: NGuan Jing <guanjing6@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Reviewed-by: NCheng Jian <cj.chengjian@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 29 12月, 2021 1 次提交
-
-
由 Marcelo Tosatti 提交于
mainline inclusion from mainline-5.14-rc1 commit a1dfb631 category: feature feature: Deep isolation bugzilla: https://gitee.com/openeuler/kernel/issues/I4N00D CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a1dfb6311c7739e21e160bc4c5575a1b21b48c87 -------------------------------- When the tick dependency of a task is updated, we want it to aknowledge the new state and restart the tick if needed. If the task is not running, we don't need to kick it because it will observe the new dependency upon scheduling in. But if the task is running, we may need to send an IPI to it so that it gets notified. Unfortunately we don't have the means to check if a task is running in a race free way. Checking p->on_cpu in a synchronized way against p->tick_dep_mask would imply adding a full barrier between prepare_task_switch() and tick_nohz_task_switch(), which we want to avoid in this fast-path. Therefore we blindly fire an IPI to the task's CPU. Meanwhile we can check if the task is queued on the CPU rq because p->on_rq is always set to TASK_ON_RQ_QUEUED _before_ schedule() and its full barrier that precedes tick_nohz_task_switch(). And if the task is queued on a nohz_full CPU, it also has fair chances to be running as the isolation constraints prescribe running single tasks on full dynticks CPUs. So use this as a trick to check if we can spare an IPI toward a non-running task. NOTE: For the ordering to be correct, it is assumed that we never deactivate a task while it is running, the only exception being the task deactivating itself while scheduling out. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210512232924.150322-9-frederic@kernel.orgSigned-off-by: NYunfeng Ye <yeyunfeng@huawei.com> Reviewed-by: NChao Liu <liuchao173@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 10 12月, 2021 1 次提交
-
-
由 Zheng Zucheng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4LH1X CVE: NA -------------------------------- When online tasks occupy cpu long time, offline task will not get cpu to run, the priority inversion issue may be triggered in this case. If the above case occurs, we will unthrottle offline tasks and let its get a chance to run. When online tasks occupy cpu over 5s(defaule value), we will unthrottle offline tasks and enter a msleep loop before exit to usermode util the cpu goto idle. Signed-off-by: NZhang Qiao <zhangqiao22@huawei.com> Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 15 11月, 2021 2 次提交
-
-
由 Andrii Nakryiko 提交于
mainline inclusion from mainline-v5.15-rc1 commit c7603cfa category: bugfix bugzilla: 182996 https://gitee.com/openeuler/kernel/issues/I4DDEL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c7603cfa04e7c3a435b31d065f7cbdc829428f6e -------------------------------- b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") fixed the problem with cgroup-local storage use in BPF by pre-allocating per-CPU array of 8 cgroup storage pointers to accommodate possible BPF program preemptions and nested executions. While this seems to work good in practice, it introduces new and unnecessary failure mode in which not all BPF programs might be executed if we fail to find an unused slot for cgroup storage, however unlikely it is. It might also not be so unlikely when/if we allow sleepable cgroup BPF programs in the future. Further, the way that cgroup storage is implemented as ambiently-available property during entire BPF program execution is a convenient way to pass extra information to BPF program and helpers without requiring user code to pass around extra arguments explicitly. So it would be good to have a generic solution that can allow implementing this without arbitrary restrictions. Ideally, such solution would work for both preemptable and sleepable BPF programs in exactly the same way. This patch introduces such solution, bpf_run_ctx. It adds one pointer field (bpf_ctx) to task_struct. This field is maintained by BPF_PROG_RUN family of macros in such a way that it always stays valid throughout BPF program execution. BPF program preemption is handled by remembering previous current->bpf_ctx value locally while executing nested BPF program and restoring old value after nested BPF program finishes. This is handled by two helper functions, bpf_set_run_ctx() and bpf_reset_run_ctx(), which are supposed to be used before and after BPF program runs, respectively. Restoring old value of the pointer handles preemption, while bpf_run_ctx pointer being a property of current task_struct naturally solves this problem for sleepable BPF programs by "following" BPF program execution as it is scheduled in and out of CPU. It would even allow CPU migration of BPF programs, even though it's not currently allowed by BPF infra. This patch cleans up cgroup local storage handling as a first application. The design itself is generic, though, with bpf_run_ctx being an empty struct that is supposed to be embedded into a specific struct for a given BPF program type (bpf_cg_run_ctx in this case). Follow up patches are planned that will expand this mechanism for other uses within tracing BPF programs. To verify that this change doesn't revert the fix to the original cgroup storage issue, I ran the same repro as in the original report ([0]) and didn't get any problems. Replacing bpf_reset_run_ctx(old_run_ctx) with bpf_reset_run_ctx(NULL) triggers the issue pretty quickly (so repro does work). [0] https://lore.kernel.org/bpf/YEEvBUiJl2pJkxTd@krava/ Fixes: b910eaaa ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper") Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210712230615.3525979-1-andrii@kernel.org Conflicts: include/linux/bpf.h include/linux/sched.h kernel/fork.c net/bpf/test_run.c Signed-off-by: NPu Lehui <pulehui@huawei.com> Reviewed-by: NKuohai Xu <xukuohai@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-5.10.74 commit bb893f075431e97dd72cde0721957a73d26578a8 bugzilla: 182986 https://gitee.com/openeuler/kernel/issues/I4I3MG Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bb893f075431e97dd72cde0721957a73d26578a8 -------------------------------- [ Upstream commit 83d40a61 ] vmlinux.o: warning: objtool: check_preemption_disabled()+0x81: call to is_percpu_thread() leaves .noinstr.text section Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20210928084218.063371959@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 21 10月, 2021 1 次提交
-
-
由 Tony Luck 提交于
stable inclusion from stable-5.10.68 commit 619d747c1850bab61625ca9d8b4730f470a5947b bugzilla: 182671 https://gitee.com/openeuler/kernel/issues/I4EWUH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=619d747c1850bab61625ca9d8b4730f470a5947b -------------------------------- commit 81065b35 upstream. There are two cases for machine check recovery: 1) The machine check was triggered by ring3 (application) code. This is the simpler case. The machine check handler simply queues work to be executed on return to user. That code unmaps the page from all users and arranges to send a SIGBUS to the task that triggered the poison. 2) The machine check was triggered in kernel code that is covered by an exception table entry. In this case the machine check handler still queues a work entry to unmap the page, etc. but this will not be called right away because the #MC handler returns to the fix up code address in the exception table entry. Problems occur if the kernel triggers another machine check before the return to user processes the first queued work item. Specifically, the work is queued using the ->mce_kill_me callback structure in the task struct for the current thread. Attempting to queue a second work item using this same callback results in a loop in the linked list of work functions to call. So when the kernel does return to user, it enters an infinite loop processing the same entry for ever. There are some legitimate scenarios where the kernel may take a second machine check before returning to the user. 1) Some code (e.g. futex) first tries a get_user() with page faults disabled. If this fails, the code retries with page faults enabled expecting that this will resolve the page fault. 2) Copy from user code retries a copy in byte-at-time mode to check whether any additional bytes can be copied. On the other side of the fence are some bad drivers that do not check the return value from individual get_user() calls and may access multiple user addresses without noticing that some/all calls have failed. Fix by adding a counter (current->mce_count) to keep track of repeated machine checks before task_work() is called. First machine check saves the address information and calls task_work_add(). Subsequent machine checks before that task_work call back is executed check that the address is in the same page as the first machine check (since the callback will offline exactly one page). Expected worst case is four machine checks before moving on (e.g. one user access with page faults disabled, then a repeat to the same address with page faults enabled ... repeat in copy tail bytes). Just in case there is some code that loops forever enforce a limit of 10. [ bp: Massage commit message, drop noinstr, fix typo, extend panic messages. ] Fixes: 5567d11c ("x86/mce: Send #MC singal from task work") Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/YT/IJ9ziLqmtqEPu@agluck-desk2.amr.corp.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Acked-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 28 7月, 2021 1 次提交
-
-
由 Peter Zijlstra (Intel) 提交于
mainline inclusion from mainline-5.12-rc1 commit b965f1dd category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I410UT CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b965f1ddb47daa5b8b2e2bc9c921431236830367 --------------------------- Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT) and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we can override their behaviour when preempt= is overriden. Since the default behaviour is full preemption, both their calls are ignored when preempt= isn't passed. [fweisbec: branch might_resched() directly to __cond_resched(), only define static calls when PREEMPT_DYNAMIC] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.orgSigned-off-by: NMa Junhai <majunhai2@huawei.com> Conflicts: include/linux/kernel.h Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 7月, 2021 1 次提交
-
-
由 Zheng Zucheng 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZX4D CVE: NA -------------------------------- If online tasks occupy 100% CPU resources, offline tasks can't be scheduled since offline tasks are throttled, as a result, offline task can't timely respond after receiving SIGKILL signal. Signed-off-by: NZheng Zucheng <zhengzucheng@huawei.com> Reviewed-by: NChen Hui <judy.chenhui@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-