- 21 1月, 2022 6 次提交
-
-
由 Atish Patra 提交于
The spinwait booting method should only be used for platforms with older firmware without SBI HSM extension or M-mode firmware because spinwait method can't support cpu hotplug, kexec or sparse hartid. It is better to move the entire spinwait implementation to its own config which can be disabled if required. It is enabled by default to maintain backward compatibility and M-mode Linux. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
The booting hart selection via lottery is only useful for SMP systems. Moreover, the lottery selection is only necessary for systems using spinwait booting method. It is better to keep the entire lottery selection together so that it can be disabled in future. Move the lottery selection code to under CONFIG_SMP. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
The __cpu_up_stack/task_pointer array is only used for spinwait method now. The per cpu array based lookup is also fragile for platforms with discontiguous/sparse hartids. The spinwait method is only used for M-mode Linux or older firmwares without SBI HSM extension. For general Linux systems, ordered booting method is preferred anyways to support cpu hotplug and kexec. Make sure that __cpu_up_stack/task_pointer is only used for spinwait method. Take this opportunity to rename it to __cpu_spinwait_stack/task_pointer to emphasize the purpose as well. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
The HSM extension information log also prints the SBI version v0.2. This is misleading as the underlying firmware SBI version may be different from v0.2. Remove the unncessary printing of SBI version. Signed-off-by: NAtish Patra <atishp@rivosinc.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Atish Patra 提交于
Currently both order booting and spinwait approach uses a per cpu array to update stack & task pointer. This approach will not work for the following cases. 1. If NR_CPUs are configured to be less than highest hart id. 2. A platform has sparse hartid. This issue can be fixed for ordered booting as the booting cpu brings up one cpu at a time using SBI HSM extension which has opaque parameter that is unused until now. Introduce a common secondary boot data structure that can store the stack and task pointer. Secondary harts will use this data while booting up to setup the sp & tp. Reviewed-by: NAnup Patel <anup@brainfault.org> Signed-off-by: NAtish Patra <atishp@rivosinc.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Heinrich Schuchardt 提交于
The SBI 0.1 specification is obsolete. The current version is 0.3. Hence we should not rely by default on SBI 0.1 being implemented. Signed-off-by: NHeinrich Schuchardt <heinrich.schuchardt@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
- 20 1月, 2022 15 次提交
-
-
由 kernel test robot 提交于
arch/riscv/mm/init.c:48:11-16: WARNING: conversion to bool not needed here Remove unneeded conversion to bool Semantic patch information: Relational and logical operators evaluate to bool, explicit conversion is overly verbose and unneeded. Generated by: scripts/coccinelle/misc/boolconv.cocci Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
Define precisely the size of the user accessible virtual space size for sv32/39/48 mmu types and explain why the whole virtual address space is split into 2 equal chunks between kernel and user space. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
Now that the mmu type is determined at runtime using SATP characteristic, use the global variable pgtable_l4_enabled to output mmu type of the processor through /proc/cpuinfo instead of relying on device tree infos. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
By adding a new 4th level of page table, give the possibility to 64bit kernel to address 2^48 bytes of virtual address: in practice, that offers 128TB of virtual address space to userspace and allows up to 64TB of physical memory. If the underlying hardware does not support sv48, we will automatically fallback to a standard 3-level page table by folding the new PUD level into PGDIR level. In order to detect HW capabilities at runtime, we use SATP feature that ignores writes with an unsupported mode. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
With 4-level page table folding at runtime, we don't know at compile time the size of the virtual address space so we must set VA_BITS dynamically so that sparsemem reserves the right amount of memory for struct pages. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
This simply gathers the different pt_ops initialization in functions where a comment was added to explain why the page table operations must be changed along the boot process. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
Now that kasan shadow region is next to the kernel, for sv48, this region won't be aligned on PGDIR_SIZE and then when populating this region, we'll need to get down to lower levels of the page table. So instead of reimplementing the page table walk for the early population, take advantage of the existing functions used for the final population. Note that kasan swapper initialization must also be split since memblock is not initialized at this point and as the last PGD is shared with the kernel, we'd need to allocate a PUD so postpone the kasan final population after the kernel population is done. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
Now that KASAN_SHADOW_OFFSET is defined at compile time as a config, this value must remain constant whatever the size of the virtual address space, which is only possible by pushing this region at the end of the address space next to the kernel mapping. Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Alexandre Ghiti 提交于
CONFIG_MAXPHYSMEM_* are actually never used, even the nommu defconfigs selecting the MAXPHYSMEM_2GB had no effects on PAGE_OFFSET since it was preempted by !MMU case right before. In addition, the move of the kernel mapping at the end of the address space broke the use of MAXPHYSMEM_2G with MMU since it defines PAGE_OFFSET at the same address as the kernel mapping. Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Fixes: 2bfc6cd8 ("riscv: Move kernel mapping outside of linear mapping") Signed-off-by: NAlexandre Ghiti <alexandre.ghiti@canonical.com> Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org> Tested-by: NConor Dooley <Conor.Dooley@microchip.com> Cc: stable@vger.kernel.org Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
eBPF's exception tables needs to be modified to relative synchronously. Suggested-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Fixes: 1f77ed94 ("riscv: switch to relative extable and other improvements") Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
Currently, the #ifdef CONFIG_XIP_KERNEL usage can be divided into the following three types: The first one is for functions/declarations only used in XIP case. The second one is for XIP_FIXUP case. Something as below: |foo_type foo; |#ifdef CONFIG_XIP_KERNEL |#define foo (*(foo_type *)XIP_FIXUP(&foo)) |#endif Usually, it's better to let the foo macro sit with the foo var together. But if various foos are defined adjacently, we can save some #ifdef CONFIG_XIP_KERNEL usage by grouping them together. The third one is for different implementations for XIP, usually, this is a #ifdef...#else...#endif case. This patch moves the pt_ops macro to adjacent #ifdef CONFIG_XIP_KERNEL and group first type usage cases into one. Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Reviewed-by: NAlexandre Ghiti <alex@ghiti.fr> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
Try our best to replace the conditional compilation using "#ifdef CONFIG_XIP_KERNEL" with "IS_ENABLED(CONFIG_XIP_KERNEL)", to simplify the code and to increase compile coverage. Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Reviewed-by: NAlexandre Ghiti <alex@ghiti.fr> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
Except "pt_ops", other global vars when CONFIG_XIP_KERNEL=y is defined as below: |foo_type foo; |#ifdef CONFIG_XIP_KERNEL |#define foo (*(foo_type *)XIP_FIXUP(&foo)) |#endif Follow the same way for pt_ops to unify the style and to simplify code. Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Reviewed-by: NAlexandre Ghiti <alex@ghiti.fr> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
Try our best to replace the conditional compilation using "#ifdef CONFIG_64BIT" by a check for "IS_ENABLED(CONFIG_64BIT)", to simplify the code and to increase compile coverage. Now we can also remove the __maybe_unused used in max_mapped_addr declaration. We also remove the BUG_ON check of mapping the last 4K bytes of the addressable memory since this is always true for every kernel actually. Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Reviewed-by: NAlexandre Ghiti <alex@ghiti.fr> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
由 Jisheng Zhang 提交于
The is_kdump_kernel() returns false for !CRASH_DUMP case, so we don't need the #ifdef CONFIG_CRASH_DUMP for is_kdump_kernel() checking. Signed-off-by: NJisheng Zhang <jszhang@kernel.org> Reviewed-by: NAlexandre Ghiti <alex@ghiti.fr> Signed-off-by: NPalmer Dabbelt <palmer@rivosinc.com>
-
- 15 1月, 2022 19 次提交
-
-
由 Aneesh Kumar K.V 提交于
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pasha Tatashin 提交于
Add page table check hooks into routines that modify user page tables. Link: https://lkml.kernel.org/r/20211221154650.1047963-5-pasha.tatashin@soleen.comSigned-off-by: NPasha Tatashin <pasha.tatashin@soleen.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Pasha Tatashin 提交于
Check user page table entries at the time they are added and removed. Allows to synchronously catch memory corruption issues related to double mapping. When a pte for an anonymous page is added into page table, we verify that this pte does not already point to a file backed page, and vice versa if this is a file backed page that is being added we verify that this page does not have an anonymous mapping We also enforce that read-only sharing for anonymous pages is allowed (i.e. cow after fork). All other sharing must be for file pages. Page table check allows to protect and debug cases where "struct page" metadata became corrupted for some reason. For example, when refcnt or mapcount become invalid. Link: https://lkml.kernel.org/r/20211221154650.1047963-4-pasha.tatashin@soleen.comSigned-off-by: NPasha Tatashin <pasha.tatashin@soleen.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Slaby <jirislaby@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <keescook@chromium.org> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wei Xu <weixugc@google.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
linux/mm_types.h should only define structure definitions, to make it cheap to include elsewhere. The atomic_t helper function definitions are particularly large, so it's better to move the helpers using those into the existing linux/mm_inline.h and only include that where needed. As a follow-up, we may want to go through all the indirect includes in mm_types.h and reduce them as much as possible. Link: https://lkml.kernel.org/r/20211207125710.2503446-2-arnd@kernel.orgSigned-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Colin Cross <ccross@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Yu Zhao <yuzhao@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Eric Biederman <ebiederm@xmission.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Qi Zheng 提交于
Since commit 4064b982 ("mm: allow VM_FAULT_RETRY for multiple times") allowed VM_FAULT_RETRY for multiple times, the FAULT_FLAG_ALLOW_RETRY bit of fault_flag will not be changed in the page fault path, so the following check is no longer needed: flags & FAULT_FLAG_ALLOW_RETRY So just remove it. [akpm@linux-foundation.org: coding style fixes] Link: https://lkml.kernel.org/r/20211110123358.36511-1-zhengqi.arch@bytedance.comSigned-off-by: NQi Zheng <zhengqi.arch@bytedance.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Kirill Shutemov <kirill@shutemov.name> Cc: Peter Xu <peterx@redhat.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Chengming Zhou <zhouchengming@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kefeng Wang 提交于
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN enabled(without KASAN_VMALLOC) on x86[1]. When the module area allocates memory, it's kmemleak_object is created successfully, but the KASAN shadow memory of module allocation is not ready, so when kmemleak scan the module's pointer, it will panic due to no shadow memory with KASAN check. module_alloc __vmalloc_node_range kmemleak_vmalloc kmemleak_scan update_checksum kasan_module_alloc kmemleak_ignore Note, there is no problem if KASAN_VMALLOC enabled, the modules area entire shadow memory is preallocated. Thus, the bug only exits on ARCH which supports dynamic allocation of module area per module load, for now, only x86/arm64/s390 are involved. Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of kmemleak in module_alloc() to fix this issue. [1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/ [wangkefeng.wang@huawei.com: fix build] Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com [akpm@linux-foundation.org: simplify ifdefs, per Andrey] Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com Fixes: 793213a8 ("s390/kasan: dynamic shadow mem allocation for modules") Fixes: 39d114dd ("arm64: add KASAN support") Fixes: bebf56a1 ("kasan: enable instrumentation of global variables") Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com> Reported-by: NYongqiang Liu <liuyongqiang13@huawei.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Greg Kroah-Hartman 提交于
There are currently two ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the ia64 topology sysfs code to use default_groups field which has been the preferred way since aa30f47c ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Link: https://lkml.kernel.org/r/20220104154800.1287947-1-gregkh@linuxfoundation.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jason Wang 提交于
The double `the' in a comment is repeated, thus it should be removed. Link: https://lkml.kernel.org/r/20211113030316.22650-1-wangborong@cdjrlc.comSigned-off-by: NJason Wang <wangborong@cdjrlc.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Guang 提交于
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Link: https://lkml.kernel.org/r/20211104001908.695110-1-yang.guang5@zte.com.cnReported-by: NZeal Robot <zealci@zte.com.cn> Signed-off-by: NYang Guang <yang.guang5@zte.com.cn> Cc: David Yang <davidcomponentone@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Guang 提交于
Use the macro 'swap()' defined in 'include/linux/minmax.h' to avoid opencoding it. Link: https://lkml.kernel.org/r/20211104062642.1506539-1-yang.guang5@zte.com.cnSigned-off-by: NYang Guang <yang.guang5@zte.com.cn> Reported-by: NZeal Robot <zealci@zte.com.cn> Cc: David Yang <davidcomponentone@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michael S. Tsirkin 提交于
This will enable cleanups down the road. The idea is to disable cbs, then add "flush_queued_cbs" callback as a parameter, this way drivers can flush any work queued after callbacks have been disabled. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20211013105226.20225-1-mst@redhat.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com>
-
由 Yang Zhong 提交于
Fix sparse warnings in xstate and remove inline prefix. Fixes: 980fe2fd ("x86/fpu: Extend fpu_xstate_prctl() with guest permissions") Signed-off-by: NYang Zhong <yang.zhong@intel.com> Reported-by: Nkernel test robot <lkp@intel.com> Message-Id: <20220113180825.322333-1-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kevin Tian 提交于
Always intercepting IA32_XFD causes non-negligible overhead when this register is updated frequently in the guest. Disable r/w emulation after intercepting the first WRMSR(IA32_XFD) with a non-zero value. Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync with the software states in fpstate and the per-cpu xfd cache. This leads to two additional changes accordingly: - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring software states back in-sync with the MSR, before handle_exit_irqoff() is called. - Always trap #NM once write interception is disabled for IA32_XFD. The #NM exception is rare if the guest doesn't use dynamic features. Otherwise, there is at most one exception per guest task given a dynamic feature. p.s. We have confirmed that SDM is being revised to say that when setting IA32_XFD[18] the AMX register state is not guaranteed to be preserved. This clarification avoids adding mess for a creative guest which sets IA32_XFD[18]=1 before saving active AMX state to its own storage. Signed-off-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-22-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Gleixner 提交于
KVM can disable the write emulation for the XFD MSR when the vCPU's fpstate is already correctly sized to reduce the overhead. When write emulation is disabled the XFD MSR state after a VMEXIT is unknown and therefore not in sync with the software states in fpstate and the per CPU XFD cache. Provide fpu_sync_guest_vmexit_xfd_state() which has to be invoked after a VMEXIT before enabling interrupts when write emulation is disabled for the XFD MSR. It could be invoked unconditionally even when write emulation is enabled for the price of a pointless MSR read. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-21-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Guang Zeng 提交于
With KVM_CAP_XSAVE, userspace uses a hardcoded 4KB buffer to get/set xstate data from/to KVM. This doesn't work when dynamic xfeatures (e.g. AMX) are exposed to the guest as they require a larger buffer size. Introduce a new capability (KVM_CAP_XSAVE2). Userspace VMM gets the required xstate buffer size via KVM_CHECK_EXTENSION(KVM_CAP_XSAVE2). KVM_SET_XSAVE is extended to work with both legacy and new capabilities by doing properly-sized memdup_user() based on the guest fpu container. KVM_GET_XSAVE is kept for backward-compatible reason. Instead, KVM_GET_XSAVE2 is introduced under KVM_CAP_XSAVE2 as the preferred interface for getting xstate buffer (4KB or larger size) from KVM (Link: https://lkml.org/lkml/2021/12/15/510) Also, update the api doc with the new KVM_GET_XSAVE2 ioctl. Signed-off-by: NGuang Zeng <guang.zeng@intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-19-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Gleixner 提交于
Userspace needs to inquire KVM about the buffer size to work with the new KVM_SET_XSAVE and KVM_GET_XSAVE2. Add the size info to guest_fpu for KVM to access. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-18-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jing Liu 提交于
Extend CPUID emulation to support XFD, AMX_TILE, AMX_INT8 and AMX_BF16. Adding those bits into kvm_cpu_caps finally activates all previous logics in this series. Hide XFD on 32bit host kernels. Otherwise it leads to a weird situation where KVM tells userspace to migrate MSR_IA32_XFD and then rejects attempts to read/write the MSR. Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-17-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jing Liu 提交于
Two XCR0 bits are defined for AMX to support XSAVE mechanism. Bit 17 is for tilecfg and bit 18 is for tiledata. The value of XCR0[17:18] is always either 00b or 11b. Also, SDM recommends that only 64-bit operating systems enable Intel AMX by setting XCR0[18:17]. 32-bit host kernel never sets the tile bits in vcpu->arch.guest_supported_xcr0. Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-16-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jing Liu 提交于
This saves one unnecessary VM-exit in guest #NM handler, given that the MSR is already restored with the guest value before the guest is resumed. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-15-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-