- 14 6月, 2022 1 次提交
-
-
由 Kai Lueke 提交于
stable inclusion from stable-v5.10.107 commit bdf0316982f00010d6e56f1379a51cd0568d51cd bugzilla: https://gitee.com/openeuler/kernel/issues/I574A2 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bdf0316982f00010d6e56f1379a51cd0568d51cd -------------------------------- commit a3d9001b upstream. This reverts commit 68ac0f38 because ID 0 was meant to be used for configuring the policy/state without matching for a specific interface (e.g., Cilium is affected, see https://github.com/cilium/cilium/pull/18789 and https://github.com/cilium/cilium/pull/19019). Signed-off-by: NKai Lueke <kailueke@linux.microsoft.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 07 6月, 2022 39 次提交
-
-
由 Jann Horn 提交于
stable inclusion from stable-v5.10.110 commit 5a41a3033a9344d7683340e3d83f5435ffb06501 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I57IYD CVE: CVE-2022-30594 -------------------------------- commit ee1fee90 upstream. Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged operation because it allows the tracee to completely bypass all seccomp filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to be settable by a process with global CAP_SYS_ADMIN, and only if that process is not subject to any seccomp filters at all. However, while these permission checks were done on the PTRACE_SETOPTIONS path, they were missing on the PTRACE_SEIZE path, which also sets user-specified ptrace flags. Move the permissions checks out into a helper function and let both ptrace_attach() and ptrace_setoptions() call it. Cc: stable@kernel.org Fixes: 13c4a901 ("seccomp: add ptrace options for suspend/resume") Signed-off-by: NJann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.comSigned-off-by: NEric W. Biederman <ebiederm@xmission.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLi Huafei <lihuafei1@huawei.com> Reviewed-by: NKuohai Xu <xukuohai@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhihao Cheng 提交于
hulk inclusion category: bugfix bugzilla: 186846, https://gitee.com/openeuler/kernel/issues/I5A80Q -------------------------------- Commit 7bc3e6e5 ("proc: Use a list of inodes to flush from proc") moved proc_flush_task() behind __exit_signal(). Then, process systemd can take long period high cpu usage during releasing task in following concurrent processes: systemd ps kernel_waitid stat(/proc/tgid) do_wait filename_lookup wait_consider_task lookup_fast release_task __exit_signal __unhash_process detach_pid __change_pid // remove task->pid_links d_revalidate -> pid_revalidate // 0 d_invalidate(/proc/tgid) shrink_dcache_parent(/proc/tgid) d_walk(/proc/tgid) spin_lock_nested(/proc/tgid/fd) // iterating opened fd proc_flush_pid | d_invalidate (/proc/tgid/fd) | shrink_dcache_parent(/proc/tgid/fd) | shrink_dentry_list(subdirs) ↓ shrink_lock_dentry(/proc/tgid/fd) --> race on dentry lock Function d_invalidate() will remove dentry from hash firstly, but why does proc_flush_pid() process dentry '/proc/tgid/fd' before dentry '/proc/tgid'? That's because proc_pid_make_inode() adds proc inode in reverse order by invoking hlist_add_head_rcu(). But proc should not add any inodes under '/proc/tgid' except '/proc/tgid/task/pid', fix it by adding inode into 'pid->inodes' only if the inode is /proc/tgid or /proc/tgid/task/pid. Performance regression: Create 200 tasks, each task open one file for 50,000 times. Kill all tasks when opened files exceed 10,000,000 (cat /proc/sys/fs/file-nr). Before fix: $ time killall -wq aa real 4m40.946s # During this period, we can see 'ps' and 'systemd' taking high cpu usage. After fix: $ time killall -wq aa real 1m20.732s # During this period, we can see 'systemd' taking high cpu usage. Fixes: 7bc3e6e5 ("proc: Use a list of inodes to flush from proc") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216054Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Reinette Chatre 提交于
mainline inclusion from mainline-v5.19-rc1 commit af117837 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I59I14 CVE: CVE-2021-33135 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=af117837ceb9a78e995804ade4726ad2c2c8981f -------------------------------- Haitao reported encountering a WARN triggered by the ENCLS[ELDU] instruction faulting with a #GP. The WARN is encountered when the reclaimer evicts a range of pages from the enclave when the same pages are faulted back right away. Consider two enclave pages (ENCLAVE_A and ENCLAVE_B) sharing a PCMD page (PCMD_AB). ENCLAVE_A is in the enclave memory and ENCLAVE_B is in the backing store. PCMD_AB contains just one entry, that of ENCLAVE_B. Scenario proceeds where ENCLAVE_A is being evicted from the enclave while ENCLAVE_B is faulted in. sgx_reclaim_pages() { ... /* * Reclaim ENCLAVE_A */ mutex_lock(&encl->lock); /* * Get a reference to ENCLAVE_A's * shmem page where enclave page * encrypted data will be stored * as well as a reference to the * enclave page's PCMD data page, * PCMD_AB. * Release mutex before writing * any data to the shmem pages. */ sgx_encl_get_backing(...); encl_page->desc |= SGX_ENCL_PAGE_BEING_RECLAIMED; mutex_unlock(&encl->lock); /* * Fault ENCLAVE_B */ sgx_vma_fault() { mutex_lock(&encl->lock); /* * Get reference to * ENCLAVE_B's shmem page * as well as PCMD_AB. */ sgx_encl_get_backing(...) /* * Load page back into * enclave via ELDU. */ /* * Release reference to * ENCLAVE_B' shmem page and * PCMD_AB. */ sgx_encl_put_backing(...); /* * PCMD_AB is found empty so * it and ENCLAVE_B's shmem page * are truncated. */ /* Truncate ENCLAVE_B backing page */ sgx_encl_truncate_backing_page(); /* Truncate PCMD_AB */ sgx_encl_truncate_backing_page(); mutex_unlock(&encl->lock); ... } mutex_lock(&encl->lock); encl_page->desc &= ~SGX_ENCL_PAGE_BEING_RECLAIMED; /* * Write encrypted contents of * ENCLAVE_A to ENCLAVE_A shmem * page and its PCMD data to * PCMD_AB. */ sgx_encl_put_backing(...) /* * Reference to PCMD_AB is * dropped and it is truncated. * ENCLAVE_A's PCMD data is lost. */ mutex_unlock(&encl->lock); } What happens next depends on whether it is ENCLAVE_A being faulted in or ENCLAVE_B being evicted - but both end up with ENCLS[ELDU] faulting with a #GP. If ENCLAVE_A is faulted then at the time sgx_encl_get_backing() is called a new PCMD page is allocated and providing the empty PCMD data for ENCLAVE_A would cause ENCLS[ELDU] to #GP If ENCLAVE_B is evicted first then a new PCMD_AB would be allocated by the reclaimer but later when ENCLAVE_A is faulted the ENCLS[ELDU] instruction would #GP during its checks of the PCMD value and the WARN would be encountered. Noting that the reclaimer sets SGX_ENCL_PAGE_BEING_RECLAIMED at the time it obtains a reference to the backing store pages of an enclave page it is in the process of reclaiming, fix the race by only truncating the PCMD page after ensuring that no page sharing the PCMD page is in the process of being reclaimed. Cc: stable@vger.kernel.org Fixes: 08999b24 ("x86/sgx: Free backing memory after faulting the enclave page") Reported-by: NHaitao Huang <haitao.huang@intel.com> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NHaitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/ed20a5db516aa813873268e125680041ae11dfcf.1652389823.git.reinette.chatre@intel.comSigned-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Reinette Chatre 提交于
mainline inclusion from mainline-v5.19-rc1 commit 2154e1c1 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I59I14 CVE: CVE-2021-33135 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2154e1c11b7080aa19f47160bd26b6f39bbd7824 -------------------------------- Recent commit 08999b24 ("x86/sgx: Free backing memory after faulting the enclave page") expanded __sgx_encl_eldu() to clear an enclave page's PCMD (Paging Crypto MetaData) from the PCMD page in the backing store after the enclave page is restored to the enclave. Since the PCMD page in the backing store is modified the page should be marked as dirty to ensure the modified data is retained. Cc: stable@vger.kernel.org Fixes: 08999b24 ("x86/sgx: Free backing memory after faulting the enclave page") Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NHaitao Huang <haitao.huang@intel.com> Link: https://lkml.kernel.org/r/00cd2ac480db01058d112e347b32599c1a806bc4.1652389823.git.reinette.chatre@intel.comSigned-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jarkko Sakkinen 提交于
mainline inclusion from mainline-v5.17-rc8 commit 08999b24 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I59I14 CVE: CVE-2021-33135 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08999b2489b4c9b939d7483dbd03702ee4576d96 -------------------------------- There is a limited amount of SGX memory (EPC) on each system. When that memory is used up, SGX has its own swapping mechanism which is similar in concept but totally separate from the core mm/* code. Instead of swapping to disk, SGX swaps from EPC to normal RAM. That normal RAM comes from a shared memory pseudo-file and can itself be swapped by the core mm code. There is a hierarchy like this: EPC <-> shmem <-> disk After data is swapped back in from shmem to EPC, the shmem backing storage needs to be freed. Currently, the backing shmem is not freed. This effectively wastes the shmem while the enclave is running. The memory is recovered when the enclave is destroyed and the backing storage freed. Sort this out by freeing memory with shmem_truncate_range(), as soon as a page is faulted back to the EPC. In addition, free the memory for PCMD pages as soon as all PCMD's in a page have been marked as unused by zeroing its contents. Cc: stable@vger.kernel.org Fixes: 1728ab54 ("x86/sgx: Add a page reclaimer") Reported-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.orgSigned-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chen Wandun 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I545DF CVE: NA -------------------------------- introduce per-memcg reclaim interface for cgroup v1, and disable memory reclaim for root memcg. Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yosry Ahmed 提交于
mainline inclusion from mainline-v5.19-rc1 commit eae3cb2e category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I545DF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eae3cb2e87ff84547e66211b81301a8f9122840f -------------------------------- Add a new test for memory.reclaim that verifies that the interface correctly reclaims memory as intended, from both anon and file pages. Link: https://lkml.kernel.org/r/20220425190040.2475377-5-yosryahmed@google.comSigned-off-by: NYosry Ahmed <yosryahmed@google.com> Acked-by: NRoman Gushchin <roman.gushchin@linux.dev> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yosry Ahmed 提交于
mainline inclusion from mainline-v5.19-rc1 commit a3622a53 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I545DF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a3622a53e620700053b648478dbc638ad373be3b -------------------------------- Currently, alloc_anon_noexit() calls alloc_anon() which instantly frees the allocated memory. alloc_anon_noexit() is usually used with cg_run_nowait() to run a process in the background that allocates memory. It makes sense for the background process to keep the memory allocated and not instantly free it (otherwise there is no point of running it in the background). Link: https://lkml.kernel.org/r/20220425190040.2475377-4-yosryahmed@google.comSigned-off-by: NYosry Ahmed <yosryahmed@google.com> Acked-by: NRoman Gushchin <roman.gushchin@linux.dev> Acked-by: NShakeel Butt <shakeelb@google.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yosry Ahmed 提交于
mainline inclusion from mainline-v5.19-rc1 commit 6c26df84 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I545DF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6c26df84e1f2f9181c0741865105a53537da842c -------------------------------- Currently, cg_read()/cg_write() returns 0 on success and -1 on failure. Modify them to return the -errno on failure. Link: https://lkml.kernel.org/r/20220425190040.2475377-3-yosryahmed@google.comSigned-off-by: NYosry Ahmed <yosryahmed@google.com> Acked-by: NShakeel Butt <shakeelb@google.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NRoman Gushchin <roman.gushchin@linux.dev> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: Wei Xu <weixugc@google.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Shakeel Butt 提交于
mainline inclusion from mainline-v5.19-rc1 commit 94968384 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I545DF CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=94968384dde15d48263bfc59d280cd71b1259d8c -------------------------------- This patch series adds a memory.reclaim proactive reclaim interface. The rationale behind the interface and how it works are in the first patch. This patch (of 4): Introduce a memcg interface to trigger memory reclaim on a memory cgroup. Use case: Proactive Reclaim --------------------------- A userspace proactive reclaimer can continuously probe the memcg to reclaim a small amount of memory. This gives more accurate and up-to-date workingset estimation as the LRUs are continuously sorted and can potentially provide more deterministic memory overcommit behavior. The memory overcommit controller can provide more proactive response to the changing behavior of the running applications instead of being reactive. A userspace reclaimer's purpose in this case is not a complete replacement for kswapd or direct reclaim, it is to proactively identify memory savings opportunities and reclaim some amount of cold pages set by the policy to free up the memory for more demanding jobs or scheduling new jobs. A user space proactive reclaimer is used in Google data centers. Additionally, Meta's TMO paper recently referenced a very similar interface used for user space proactive reclaim: https://dl.acm.org/doi/pdf/10.1145/3503222.3507731 Benefits of a user space reclaimer: ----------------------------------- 1) More flexible on who should be charged for the cpu of the memory reclaim. For proactive reclaim, it makes more sense to be centralized. 2) More flexible on dedicating the resources (like cpu). The memory overcommit controller can balance the cost between the cpu usage and the memory reclaimed. 3) Provides a way to the applications to keep their LRUs sorted, so, under memory pressure better reclaim candidates are selected. This also gives more accurate and uptodate notion of working set for an application. Why memory.high is not enough? ------------------------------ - memory.high can be used to trigger reclaim in a memcg and can potentially be used for proactive reclaim. However there is a big downside in using memory.high. It can potentially introduce high reclaim stalls in the target application as the allocations from the processes or the threads of the application can hit the temporary memory.high limit. - Userspace proactive reclaimers usually use feedback loops to decide how much memory to proactively reclaim from a workload. The metrics used for this are usually either refaults or PSI, and these metrics will become messy if the application gets throttled by hitting the high limit. - memory.high is a stateful interface, if the userspace proactive reclaimer crashes for any reason while triggering reclaim it can leave the application in a bad state. - If a workload is rapidly expanding, setting memory.high to proactively reclaim memory can result in actually reclaiming more memory than intended. The benefits of such interface and shortcomings of existing interface were further discussed in this RFC thread: https://lore.kernel.org/linux-mm/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/ Interface: ---------- Introducing a very simple memcg interface 'echo 10M > memory.reclaim' to trigger reclaim in the target memory cgroup. The interface is introduced as a nested-keyed file to allow for future optional arguments to be easily added to configure the behavior of reclaim. Possible Extensions: -------------------- - This interface can be extended with an additional parameter or flags to allow specifying one or more types of memory to reclaim from (e.g. file, anon, ..). - The interface can also be extended with a node mask to reclaim from specific nodes. This has use cases for reclaim-based demotion in memory tiering systens. - A similar per-node interface can also be added to support proactive reclaim and reclaim-based demotion in systems without memcg. - Add a timeout parameter to make it easier for user space to call the interface without worrying about being blocked for an undefined amount of time. For now, let's keep things simple by adding the basic functionality. [yosryahmed@google.com: worked on versions v2 onwards, refreshed to current master, updated commit message based on recent discussions and use cases] Link: https://lkml.kernel.org/r/20220425190040.2475377-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220425190040.2475377-2-yosryahmed@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Co-developed-by: NYosry Ahmed <yosryahmed@google.com> Signed-off-by: NYosry Ahmed <yosryahmed@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NWei Xu <weixugc@google.com> Acked-by: NRoman Gushchin <roman.gushchin@linux.dev> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Tejun Heo <tj@kernel.org> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Greg Thelen <gthelen@google.com> Cc: Chen Wandun <chenwandun@huawei.com> Cc: Vaibhav Jain <vaibhav@linux.ibm.com> Cc: "Michal Koutn" <mkoutny@suse.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mingwei Zhang 提交于
mainline inclusion from mainline-v5.18-rc4 commit 683412cc category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I58DFI CVE: CVE-2022-0171 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=683412ccf61294d727ead4a73d97397396e69a6b -------------------------------- Flush the CPU caches when memory is reclaimed from an SEV guest (where reclaim also includes it being unmapped from KVM's memslots). Due to lack of coherency for SEV encrypted memory, failure to flush results in silent data corruption if userspace is malicious/broken and doesn't ensure SEV guest memory is properly pinned and unpinned. Cache coherency is not enforced across the VM boundary in SEV (AMD APM vol.2 Section 15.34.7). Confidential cachelines, generated by confidential VM guests have to be explicitly flushed on the host side. If a memory page containing dirty confidential cachelines was released by VM and reallocated to another user, the cachelines may corrupt the new user at a later time. KVM takes a shortcut by assuming all confidential memory remain pinned until the end of VM lifetime. Therefore, KVM does not flush cache at mmu_notifier invalidation events. Because of this incorrect assumption and the lack of cache flushing, malicous userspace can crash the host kernel: creating a malicious VM and continuously allocates/releases unpinned confidential memory pages when the VM is running. Add cache flush operations to mmu_notifier operations to ensure that any physical memory leaving the guest VM get flushed. In particular, hook mmu_notifier_invalidate_range_start and mmu_notifier_release events and flush cache accordingly. The hook after releasing the mmu lock to avoid contention with other vCPUs. Cc: stable@vger.kernel.org Suggested-by: NSean Christpherson <seanjc@google.com> Reported-by: NMingwei Zhang <mizhang@google.com> Signed-off-by: NMingwei Zhang <mizhang@google.com> Message-Id: <20220421031407.2516575-4-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> conflicts: arch/x86/include/asm/kvm-x86-ops.h arch/x86/kvm/x86.c virt/kvm/kvm_main.c Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NLiao Chang <liaochang1@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhang Jian 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54IL8 CVE: NA backport: openEuler-22.03-LTS ----------------------------- When passing numa id to sp_alloc, sometimes numa id does not work. This is because memory policy will change numa id to a preferred one if memory policy is set. Fix the error by mbind virtual address to desired numa id. Signed-off-by: NZhang Jian <zhangjian210@huawei.com> Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com> Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Wensheng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54I7W CVE: NA backport: openEuler-22.03-LTS -------------------------------------------------- There is a ABBA deadlock in the process of sp_group_add_task and proc_stat_show(). PROCESS A: sp_group_add_task() acquire sp_group_sem write lock ->sp_init_proc_stat() acquire sp_spg_stat_sem write lock PROCESS B: proc_stat_show() acquire sp_spg_stat_sem read lock ->idr_proc_stat_cb() acquire sp_group_sem read lock Here we choose the simplest way that acquires sp_group_sem and sp_stat_sem read lock subsequently in proc_stat_show(), since it just has effect on the process of debug feature. Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guo Mengqi 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I54I59 CVE: NA backport: openEuler-22.03-LTS -------------------------------------------------- When hisi_oom_notifier_call calls spg_overview_show, it requires the global rwsem sp_group_sem, which had been held by another process when oomed. This leads to kernel hungtask. At another position the unecessary sp_group_sem causes an ABBA deadlock. [ 1934.549016] INFO: task klogd:2757 blocked for more than 120 seconds. [ 1934.562408] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1934.570231] klogd D 0 2757 2746 0x00000000 [ 1934.575707] Call trace: [ 1934.578162] __switch_to+0xe8/0x150 [ 1934.581648] __schedule+0x250/0x558 [ 1934.585133] schedule+0x30/0xf0 [ 1934.588267] rwsem_down_read_failed+0x10c/0x188 [ 1934.592788] down_read+0x60/0x68 [ 1934.596015] spg_overview_show.part.31+0xc8/0xf8 [ 1934.600622] spg_overview_show+0x2c/0x38 [ 1934.604543] hisi_oom_notifier_call+0xe8/0x120 [ 1934.608975] out_of_memory+0x7c/0x570 [ 1934.612631] __alloc_pages_nodemask+0xcfc/0xd98 [ 1934.617158] alloc_pages_current+0x88/0xf0 [ 1934.621246] __page_cache_alloc+0x8c/0xd8 [ 1934.625247] page_cache_alloc_inode+0x48/0x58 [ 1934.629595] filemap_fault+0x360/0x8e0 [ 1934.633341] ext4_filemap_fault+0x38/0x128 [ 1934.637431] __do_fault+0x50/0x218 [ 1934.640822] __handle_mm_fault+0x69c/0x9c8 [ 1934.644909] handle_mm_fault+0xf8/0x200 [ 1934.648740] do_page_fault+0x220/0x508 [ 1934.652477] do_translation_fault+0xa8/0xbc [ 1934.656652] do_mem_abort+0x68/0x118 [ 1934.660216] do_el0_ia_bp_hardening+0x6c/0xd8 [ 1934.664565] el0_ia+0x20/0x24 Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Willy Tarreau 提交于
mainline inclusion from mainline-v5.18-rc5 commit 233087ca category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I59I1C CVE: CVE-2022-1836 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.18&id=233087ca063686964a53c829d547c7571e3f67bf -------------------------------- Minh Yuan reported a concurrency use-after-free issue in the floppy code between raw_cmd_ioctl and seek_interrupt. [ It turns out this has been around, and that others have reported the KASAN splats over the years, but Minh Yuan had a reproducer for it and so gets primary credit for reporting it for this fix - Linus ] The problem is, this driver tends to break very easily and nowadays, nobody is expected to use FDRAWCMD anyway since it was used to manipulate non-standard formats. The risk of breaking the driver is higher than the risk presented by this race, and accessing the device requires privileges anyway. Let's just add a config option to completely disable this ioctl and leave it disabled by default. Distros shouldn't use it, and only those running on antique hardware might need to enable it. Link: https://lore.kernel.org/all/000000000000b71cdd05d703f6bf@google.com/ Link: https://lore.kernel.org/lkml/CAKcFiNC=MfYVW-Jt9A3=FPJpTwCD2PL_ULNCpsCVE5s8ZeBQgQ@mail.gmail.com Link: https://lore.kernel.org/all/CAEAjamu1FRhz6StCe_55XY5s389ZP_xmCF69k987En+1z53=eg@mail.gmail.comReported-by: NMinh Yuan <yuanmingbuaa@gmail.com> Reported-by: syzbot+8e8958586909d62b6840@syzkaller.appspotmail.com Reported-by: Ncruise k <cruise4k@gmail.com> Reported-by: NKyungtae Kim <kt0755@gmail.com> Suggested-by: NLinus Torvalds <torvalds@linuxfoundation.org> Tested-by: NDenis Efremov <efremov@linux.com> Signed-off-by: NWilly Tarreau <w@1wt.eu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NLuo Meng <luomeng12@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 LeoLiu-oc 提交于
zhaoxin inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I40QDN CVE: NA ---------------------------------------------------------------- Fix a logic issue when display Zhaoxin XHCI root hub speed. Signed-off-by: NLeoLiu-oc <LeoLiu-oc@zhaoxin.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I56XPR -------------------------------- Optimize the use of memb instruction in memset. Rewrite memcpy and use simd instruction to copy data when src and dest are not co-aligned. When data size is larger than 2KB, use _nc store instruction to improve performance. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56XYC -------------------------------- Using sys_sendfile will cause failure in sending large file(>=2GB). This patch fixes the problem. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56UNZ -------------------------------- There's only AT_SYSINFO_EHDR auxiliary vector so far in SW64 ELF. It's used to store vdso base of current task. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Xu Chenjiao 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56XUF -------------------------------- Refactor PME and AER relative codes to use builtin kernel module, one can enable this feature support via selecting CONFIG_PCIEPORTBUS=y, CONFIG_PCIE_PME=y and CONFIG_PCIEAER=y. Since PME and AER events route to CPU by MSI mechanism has been hardcoded and lack of flexibility, we use PCI INTx mechanism to handle these events. Signed-off-by: NXu Chenjiao <xuchenjiao@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lu Feifei 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56WV8 -------------------------------- When passing through device, 32-bit MEMIO address of guest os is between 3G and 3.5G, and it may produce dma address beyond 3.5G, which is considered to be MEMIO address of host and cause DMA to to fail. To fix it, 32-bit MEMEIO address of guest os should keep consistent with host. Signed-off-by: NLu Feifei <lufeifei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yingying 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56UNZ -------------------------------- Signed-off-by: NWang Yingying <wangyingying@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lu Feifei 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56WV8 -------------------------------- Considering that when mmio of the passthrough device establishes a page table mapping, it cannot be in the same page as the emulated mmio address, so the page size is used as boundary when allocating the mmio address resource. Signed-off-by: NLu Feifei <lufeifei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: performance bugzilla: https://gitee.com/openeuler/kernel/issues/I56XPR -------------------------------- Both context_fpregs and sc_fpregs are contigous, so it may improve performance with copy_{from,to}_user. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- addl and subl only support 8-bit immediate value, which is not suitable for us to extend struct pt_regs later. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhu Donghong 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56UPH -------------------------------- Provide the KCS and BT of IPMI system interfaces that system software uses for communication with BMC. Since hardware may occur default errors due to lack of BMC hardware, these interface devices are disabled by default. To enable IPMI support, one may enable supported BT/KCS interfaces and also select CONFIG_IPMI_HANDLER=y, CONFIG_IPMI_DEVICE_INTERFACE=y and CONFIG_IPMI_SI=y. Signed-off-by: NZhu Donghong <zhudonghong@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yingying 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56UNZ -------------------------------- Signed-off-by: NWang Yingying <wangyingying@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Wang Yingying 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56UNZ -------------------------------- This patch adds SW64 PVT monitor driver to enable voltage sensor support. Signed-off-by: NWang Yingying <wangyingying@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhao Yihan 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- BIOS passes logical address as physical address, which starts with PAGE_OFFSET. It will overflow if add PAGE_OFFSET again. Signed-off-by: NZhao Yihan <zhaoyihan@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56P0Z -------------------------------- SW64 uses 256-bit fpregs and has very special indexes for vector regs. Fix CFI directives in vdso rt_sigreturn accordingly. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56P0Z -------------------------------- Add proper CFI directives to vdso rt_sigreturn to make backtrace work properly. Signed-off-by: NMao Minkai <maominkai@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Lu Feifei 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Remove struct swvm to make the code more concise. Signed-off-by: NLu Feifei <lufeifei@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- Suppose that parent traces child which masks all the signals. If trap or fault occurs on child, it will pass through. To do it right, we have to force signal and fault in that case. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Sheng 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56QAM -------------------------------- This option hasn't been tested for a while, and it throws compile errors because the required include header file is missing. Signed-off-by: NHe Sheng <hesheng@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Mao Minkai 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56XYC -------------------------------- Signed-off-by: NMao Minkai <maominkai@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tang Jinyang 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56QDM -------------------------------- This support must work together with dynamic frequency scaling, which can dynamically turning on/off cores according to global system load. Signed-off-by: NTang Jinyang <tangjinyang@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Tang Jinyang 提交于
Sunway inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I56QDM -------------------------------- Dynamic voltage and frequency scaling (DVFS) is a well-known technique to reduce power/or energy consumption of various applications. But in the past, cpu frequency scaling has to be done manually on sw64. Now we add dynamic frequency scaling support, which allows system to scale frequency dynamically according to workload. Signed-off-by: NTang Jinyang <tangjinyang@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Cui Mingrui 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56W9F -------------------------------- When dest address is not aligned, the checksum result is incorrect. This is caused by wrong usage of insll and inshl. Signed-off-by: NCui Mingrui <cuimingrui@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 He Chuyue 提交于
Sunway inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I56OLG -------------------------------- The `reg` and `r` are always the same, so save one of them and remove the other. Signed-off-by: NHe Chuyue <hechuyue@wxiat.com> Signed-off-by: NGu Zitao <guzitao@wxiat.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-