- 22 7月, 2019 3 次提交
-
-
由 Wanpeng Li 提交于
The idea before commit 240c35a3 (which has just been reverted) was that we have the following FPU states: userspace (QEMU) guest --------------------------------------------------------------------------- processor vcpu->arch.guest_fpu >>> KVM_RUN: kvm_load_guest_fpu vcpu->arch.user_fpu processor >>> preempt out vcpu->arch.user_fpu current->thread.fpu >>> preempt in vcpu->arch.user_fpu processor >>> back to userspace >>> kvm_put_guest_fpu processor vcpu->arch.guest_fpu --------------------------------------------------------------------------- With the new lazy model we want to get the state back to the processor when schedule in from current->thread.fpu. Reported-by: NThomas Lambertz <mail@thomaslambertz.de> Reported-by: Nanthony <antdev66@gmail.com> Tested-by: Nanthony <antdev66@gmail.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Thomas Lambertz <mail@thomaslambertz.de> Cc: anthony <antdev66@gmail.com> Cc: stable@vger.kernel.org Fixes: 5f409e20 (x86/fpu: Defer FPU state load until return to userspace) Signed-off-by: NWanpeng Li <wanpengli@tencent.com> [Add a comment in front of the warning. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This reverts commit 240c35a3 ("kvm: x86: Use task structs fpu field for user", 2018-11-06). The commit is broken and causes QEMU's FPU state to be destroyed when KVM_RUN is preempted. Fixes: 240c35a3 ("kvm: x86: Use task structs fpu field for user") Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jan Kiszka 提交于
Letting this pend may cause nested_get_vmcs12_pages to run against an invalid state, corrupting the effective vmcs of L1. This was triggerable in QEMU after a guest corruption in L2, followed by a L1 reset. Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Fixes: 7f7f1ba3 ("KVM: x86: do not load vmcs12 pages while still in SMM") Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 7月, 2019 6 次提交
-
-
由 Eric Hankland 提交于
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: NEric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
If a KVM guest is reset while running a nested guest, free_nested will disable the shadow VMCS execution control in the vmcs01. However, on the next KVM_RUN vmx_vcpu_run would nevertheless try to sync the VMCS12 to the shadow VMCS which has since been freed. This causes a vmptrld of a NULL pointer on my machime, but Jan reports the host to hang altogether. Let's see how much this trivial patch fixes. Reported-by: NJan Kiszka <jan.kiszka@siemens.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
This is useful for debugging, and is ratelimited nowadays. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Like Xu 提交于
If a perf_event creation fails due to any reason of the host perf subsystem, it has no chance to log the corresponding event for guest which may cause abnormal sampling data in guest result. In debug mode, this message helps to understand the state of vPMC and we may not limit the number of occurrences but not in a spamming style. Suggested-by: NJoe Perches <joe@perches.com> Signed-off-by: NLike Xu <like.xu@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
When CPU raise #NPF on guest data access and guest CR4.SMAP=1, it is possible that CPU microcode implementing DecodeAssist will fail to read bytes of instruction which caused #NPF. This is AMD errata 1096 and it happens because CPU microcode reading instruction bytes incorrectly attempts to read code as implicit supervisor-mode data accesses (that is, just like it would read e.g. a TSS), which are susceptible to SMAP faults. The microcode reads CS:RIP and if it is a user-mode address according to the page tables, the processor gives up and returns no instruction bytes. In this case, GuestIntrBytes field of the VMCB on a VMEXIT will incorrectly return 0 instead of the correct guest instruction bytes. Current KVM code attemps to detect and workaround this errata, but it has multiple issues: 1) It mistakenly checks if guest CR4.SMAP=0 instead of guest CR4.SMAP=1, which is required for encountering a SMAP fault. 2) It assumes SMAP faults can only occur when guest CPL==3. However, in case guest CR4.SMEP=0, the guest can execute an instruction which reside in a user-accessible page with CPL<3 priviledge. If this instruction raise a #NPF on it's data access, then CPU DecodeAssist microcode will still encounter a SMAP violation. Even though no sane OS will do so (as it's an obvious priviledge escalation vulnerability), we still need to handle this semanticly correct in KVM side. Note that (2) *is* a useful optimization, because CR4.SMAP=1 is an easy triggerable condition and guests usually enable SMAP together with SMEP. If the vCPU has CR4.SMEP=1, the errata could indeed be encountered onlt at guest CPL==3; otherwise, the CPU would raise a SMEP fault to guest instead of #NPF. We keep this condition to avoid false positives in the detection of the errata. In addition, to avoid future confusion and improve code readbility, include details of the errata in code and not just in commit message. Fixes: 05d5a486 ("KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation)") Cc: Singh Brijesh <brijesh.singh@amd.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Dedicated instances are currently disturbed by unnecessary jitter due to the emulated lapic timers firing on the same pCPUs where the vCPUs reside. There is no hardware virtual timer on Intel for guest like ARM, so both programming timer in guest and the emulated timer fires incur vmexits. This patch tries to avoid vmexit when the emulated timer fires, at least in dedicated instance scenario when nohz_full is enabled. In that case, the emulated timers can be offload to the nearest busy housekeeping cpus since APICv has been found for several years in server processors. The guest timer interrupt can then be injected via posted interrupts, which are delivered by the housekeeping cpu once the emulated timer fires. The host should tuned so that vCPUs are placed on isolated physical processors, and with several pCPUs surplus for busy housekeeping. If disabled mwait/hlt/pause vmexits keep the vCPUs in non-root mode, ~3% redis performance benefit can be observed on Skylake server, and the number of external interrupt vmexits drops substantially. Without patch VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 42916 49.43% 39.30% 0.47us 106.09us 0.71us ( +- 1.09% ) While with patch: VM-EXIT Samples Samples% Time% Min Time Max Time Avg time EXTERNAL_INTERRUPT 6871 9.29% 2.96% 0.44us 57.88us 0.72us ( +- 4.02% ) Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 7月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
Commit 61abdbe0 ("kvm: x86: make lapic hrtimer pinned") pinned the lapic timer to avoid to wait until the next kvm exit for the guest to see KVM_REQ_PENDING_TIMER set. There is another solution to give a kick after setting the KVM_REQ_PENDING_TIMER bit, make lapic timer unpinned will be used in follow up patches. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 7月, 2019 1 次提交
-
-
由 Like Xu 提交于
To avoid semantic inconsistency, the fixed_counters in Intel vPMU need to be reset to 0 in intel_pmu_reset() as gp_counters does. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 7月, 2019 2 次提交
-
-
由 Liran Alon 提交于
As reported by Maxime at https://bugzilla.kernel.org/show_bug.cgi?id=204175: In vmx/nested.c::get_vmx_mem_address(), when the guest runs in long mode, the base address of the memory operand is computed with a simple: *ret = s.base + off; This is incorrect, the base applies only to FS and GS, not to the others. Because of that, if the guest uses a VMX instruction based on DS and has a DS.base that is non-zero, KVM wrongfully adds the base to the resulting address. Reported-by: NMaxime Villard <max@m00nbsd.net> Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yi Wang 提交于
The ioapic_debug and apic_debug have been not used for years, and kvm tracepoints are enough for debugging, so remove them as Paolo suggested. However, there may be something wrong when pv evi get/put user, so it's better to retain some log there. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 7月, 2019 5 次提交
-
-
由 Yi Wang 提交于
There are some pr_debug in TSC code, which may have been no use, so remove them as Paolo suggested. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yi Wang 提交于
This fixes the following coccinelle warning: WARNING: return of 0/1 in function 'vmx_need_emulation_on_page_fault' with return type bool Return false instead of 0. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arnd Bergmann 提交于
clang finds a contruct suspicious that converts an unsigned character to a signed integer and back, causing an overflow: arch/x86/kvm/mmu.c:4605:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -205 to 51 [-Werror,-Wconstant-conversion] u8 wf = (pfec & PFERR_WRITE_MASK) ? ~w : 0; ~~ ^~ arch/x86/kvm/mmu.c:4607:38: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -241 to 15 [-Werror,-Wconstant-conversion] u8 uf = (pfec & PFERR_USER_MASK) ? ~u : 0; ~~ ^~ arch/x86/kvm/mmu.c:4609:39: error: implicit conversion from 'int' to 'u8' (aka 'unsigned char') changes value from -171 to 85 [-Werror,-Wconstant-conversion] u8 ff = (pfec & PFERR_FETCH_MASK) ? ~x : 0; ~~ ^~ Add an explicit cast to tell clang that everything works as intended here. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Link: https://github.com/ClangBuiltLinux/linux/issues/95Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Arnd Bergmann 提交于
Clang notices a code path in which some variables are never initialized, but fails to figure out that this can never happen on i386 because is_64_bit_mode() always returns false. arch/x86/kvm/hyperv.c:1610:6: error: variable 'ingpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (!longmode) { ^~~~~~~~~ arch/x86/kvm/hyperv.c:1632:55: note: uninitialized use occurs here trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); ^~~~~ arch/x86/kvm/hyperv.c:1610:2: note: remove the 'if' if its condition is always true if (!longmode) { ^~~~~~~~~~~~~~~ arch/x86/kvm/hyperv.c:1595:18: note: initialize the variable 'ingpa' to silence this warning u64 param, ingpa, outgpa, ret = HV_STATUS_SUCCESS; ^ = 0 arch/x86/kvm/hyperv.c:1610:6: error: variable 'outgpa' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] arch/x86/kvm/hyperv.c:1610:6: error: variable 'param' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] Flip the condition around to avoid the conditional execution on i386. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jing Liu 提交于
AVX512 BFLOAT16 instructions support 16-bit BFLOAT16 floating-point format (BF16) for deep learning optimization. Intel adds AVX512 BFLOAT16 feature in CooperLake, which is CPUID.7.1.EAX[5]. Detailed information of the CPUID bit can be found here, https://software.intel.com/sites/default/files/managed/c5/15/\ architecture-instruction-set-extensions-programming-reference.pdf. Signed-off-by: NJing Liu <jing2.liu@linux.intel.com> [Fix type mismatch in min, changing constant "1" to "1u". - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 7月, 2019 7 次提交
-
-
由 Anshuman Khandual 提交于
Drop the pgtable_t variable from all implementation for pte_fn_t as none of them use it. apply_to_pte_range() should stop computing it as well. Should help us save some cycles. Link: http://lkml.kernel.org/r/1556803126-26596-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Acked-by: NMatthew Wilcox <willy@infradead.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: <jglisse@redhat.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mike Rapoport 提交于
Most architectures have identical or very similar implementation of pte_alloc_one_kernel(), pte_alloc_one(), pte_free_kernel() and pte_free(). Add a generic implementation that can be reused across architectures and enable its use on x86. The generic implementation uses GFP_KERNEL | __GFP_ZERO for the kernel page tables and GFP_KERNEL | __GFP_ZERO | __GFP_ACCOUNT for the user page tables. The "base" functions for PTE allocation, namely __pte_alloc_one_kernel() and __pte_alloc_one() are intended for the architectures that require additional actions after actual memory allocation or must use non-default GFP flags. x86 is switched to use generic pte_alloc_one_kernel(), pte_free_kernel() and pte_free(). x86 still implements pte_alloc_one() to allow run-time control of GFP flags required for "userpte" command line option. Link: http://lkml.kernel.org/r/1557296232-15361-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Guo Ren <ren_guo@c-sky.com> Cc: Helge Deller <deller@gmx.de> Cc: Ley Foon Tan <lftan@altera.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Creasey <sammy@sammy.net> Cc: Vincent Chen <deanbo422@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
We only support the generic GUP now, so rename the config option to be more clear, and always use the mm/Kconfig definition of the symbol and select it from the arch Kconfigs. Link: http://lkml.kernel.org/r/20190625143715.1689-11-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
The split low/high access is the only non-READ_ONCE version of gup_get_pte that did show up in the various arch implemenations. Lift it to common code and drop the ifdef based arch override. Link: http://lkml.kernel.org/r/20190625143715.1689-4-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Pass in the already calculated end value instead of recomputing it, and leave the end > start check in the callers instead of duplicating them in the arch code. Link: http://lkml.kernel.org/r/20190625143715.1689-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJason Gunthorpe <jgg@mellanox.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Miller <davem@davemloft.net> Cc: James Hogan <jhogan@kernel.org> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marco Elver 提交于
This adds a new header to asm-generic to allow optionally instrumenting architecture-specific asm implementations of bitops. This change includes the required change for x86 as reference and changes the kernel API doc to point to bitops-instrumented.h instead. Rationale: the functions in x86's bitops.h are no longer the kernel API functions, but instead the arch_ prefixed functions, which are then instrumented via bitops-instrumented.h. Other architectures can similarly add support for asm implementations of bitops. The documentation text was derived from x86 and existing bitops asm-generic versions: 1) references to x86 have been removed; 2) as a result, some of the text had to be reworded for clarity and consistency. Tested using lib/test_kasan with bitops tests (pre-requisite patch). Bugzilla ref: https://bugzilla.kernel.org/show_bug.cgi?id=198439 Link: http://lkml.kernel.org/r/20190613125950.197667-4-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Acked-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marco Elver 提交于
This patch is a pre-requisite for enabling KASAN bitops instrumentation; using static_cpu_has instead of boot_cpu_has avoids instrumentation of test_bit inside the uaccess region. With instrumentation, the KASAN check would otherwise be flagged by objtool. For consistency, kernel/signal.c was changed to mirror this change, however, is never instrumented with KASAN (currently unsupported under x86 32bit). Link: http://lkml.kernel.org/r/20190613125950.197667-3-elver@google.comSigned-off-by: NMarco Elver <elver@google.com> Suggested-by: NH. Peter Anvin <hpa@zytor.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 7月, 2019 4 次提交
-
-
由 Sean Christopherson 提交于
On VMX, KVM currently does not re-enable irqs until after it has exited the guest context. As a result, a tick that fires in the window between VM-Exit and guest_exit_irqoff() will be accounted as system time. While said window is relatively small, it's large enough to be problematic in some configurations, e.g. if VM-Exits are consistently occurring a hair earlier than the tick irq. Intentionally toggle irqs back off so that guest_exit_irqoff() can be used in lieu of guest_exit() in order to avoid the save/restore of flags in guest_exit(). On my Haswell system, "nop; cli; sti" is ~6 cycles, versus ~28 cycles for "pushf; pop <reg>; cli; push <reg>; popf". Fixes: f2485b3e ("KVM: x86: use guest_exit_irqoff") Reported-by: NWei Yang <w90p710@gmail.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Eric Hankland 提交于
Some events can provide a guest with information about other guests or the host (e.g. L3 cache stats); providing the capability to restrict access to a "safe" set of events would limit the potential for the PMU to be used in any side channel attacks. This change introduces a new VM ioctl that sets an event filter. If the guest attempts to program a counter for any blacklisted or non-whitelisted event, the kernel counter won't be created, so any RDPMC/RDMSR will show 0 instances of that event. Signed-off-by: NEric Hankland <ehankland@google.com> [Lots of changes. All remaining bugs are probably mine. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Eiichi Tsukata 提交于
arch_stack_walk_user() checks `if (fp == frame.next_fp)` to prevent a infinite loop by self reference but it's not enogh for circular reference. Once a lack of return address is found, there is no point to continue the loop, so break out. Fixes: 02b67518 ("tracing: add support for userspace stacktraces in tracing/iter_ctrl") Signed-off-by: NEiichi Tsukata <devel@etsukata.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Link: https://lkml.kernel.org/r/20190711023501.963-1-devel@etsukata.com
-
由 Thomas Gleixner 提交于
The pinning of sensitive CR0 and CR4 bits caused a boot crash when loading the kvm_intel module on a kernel compiled with CONFIG_PARAVIRT=n. The reason is that the static key which controls the pinning is marked RO after init. The kvm_intel module contains a CR4 write which requires to update the static key entry list. That obviously does not work when the key is in a RO section. With CONFIG_PARAVIRT enabled this does not happen because the CR4 write uses the paravirt indirection and the actual write function is built in. As the key is intended to be immutable after init, move native_write_cr0/4() out of line. While at it consolidate the update of the cr4 shadow variable and store the value right away when the pinning is initialized on a booting CPU. No point in reading it back 20 instructions later. This allows to confine the static key and the pinning variable to cpu/common and allows to mark them static. Fixes: 8dbec27a ("x86/asm: Pin sensitive CR0 bits") Fixes: 873d50d5 ("x86/asm: Pin sensitive CR4 bits") Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Reported-by: NXi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NXi Ruoyao <xry111@mengyan1223.wang> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1907102140340.1758@nanos.tec.linutronix.de
-
- 10 7月, 2019 3 次提交
-
-
由 Arnd Bergmann 提交于
clang points out that the computation of LOWMEM_PAGES causes a signed integer overflow on 32-bit x86: arch/x86/kernel/head32.c:83:20: error: signed shift result (0x100000000) requires 34 bits to represent, but 'int' only has 32 bits [-Werror,-Wshift-overflow] (PAGE_TABLE_SIZE(LOWMEM_PAGES) << PAGE_SHIFT); ^~~~~~~~~~~~ arch/x86/include/asm/pgtable_32.h:109:27: note: expanded from macro 'LOWMEM_PAGES' #define LOWMEM_PAGES ((((2<<31) - __PAGE_OFFSET) >> PAGE_SHIFT)) ~^ ~~ arch/x86/include/asm/pgtable_32.h:98:34: note: expanded from macro 'PAGE_TABLE_SIZE' #define PAGE_TABLE_SIZE(pages) ((pages) / PTRS_PER_PGD) Use the _ULL() macro to make it a 64-bit constant. Fixes: 1e620f9b ("x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190710130522.1802800-1-arnd@arndb.de
-
由 Yi Wang 提交于
We get a warning when build kernel W=1: arch/x86/kvm/../../../virt/kvm/eventfd.c:48:1: warning: no previous prototype for ‘kvm_arch_irqfd_allowed’ [-Wmissing-prototypes] kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args) ^ The reason is kvm_arch_irqfd_allowed() is declared in arch/x86/kvm/irq.h, which is not included by eventfd.c. Considering kvm_arch_irqfd_allowed() is a weakly defined function in eventfd.c, remove the declaration to kvm_host.h can fix this. Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Zijlstra 提交于
KASAN shows the following splat during boot: BUG: KASAN: unknown-crash in unwind_next_frame+0x3f6/0x490 Read of size 8 at addr ffffffff84007db0 by task swapper/0 CPU: 0 PID: 0 Comm: swapper Tainted: G T 5.2.0-rc6-00013-g7457c0da #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x19/0x1b print_address_description+0x1b0/0x2b2 __kasan_report+0x10f/0x171 kasan_report+0x12/0x1c __asan_load8+0x54/0x81 unwind_next_frame+0x3f6/0x490 unwind_next_frame+0x1b/0x23 arch_stack_walk+0x68/0xa5 stack_trace_save+0x7b/0xa0 save_trace+0x3c/0x93 mark_lock+0x1ef/0x9b1 lock_acquire+0x122/0x221 __mutex_lock+0xb6/0x731 mutex_lock_nested+0x16/0x18 _vm_unmap_aliases+0x141/0x183 vm_unmap_aliases+0x14/0x16 change_page_attr_set_clr+0x15e/0x2f2 set_memory_4k+0x2a/0x2c check_bugs+0x11fd/0x1298 start_kernel+0x793/0x7eb x86_64_start_reservations+0x55/0x76 x86_64_start_kernel+0x87/0xaa secondary_startup_64+0xa4/0xb0 Memory state around the buggy address: ffffffff84007c80: 00 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 ffffffff84007d00: f1 00 00 00 00 00 00 00 00 00 f2 f2 f2 f3 f3 f3 >ffffffff84007d80: f3 79 be 52 49 79 be 00 00 00 00 00 00 00 00 f1 It turns out that int3_selftest() is corrupting the stack. The problem is that the KASAN-ified version of int3_magic() is much less trivial than the C code appears. It clobbers several unexpected registers. So when the selftest's INT3 is converted to an emulated call to int3_magic(), the registers are clobbered and Bad Things happen when the function returns. Fix this by converting int3_magic() to the trivial ASM function it should be, avoiding all calling convention issues. Also add ASM_CALL_CONSTRAINT to the INT3 ASM, since it contains a 'CALL'. [peterz: cribbed changelog from josh] Fixes: 7457c0da ("x86/alternatives: Add int3_emulate_call() selftest") Reported-by: Nkernel test robot <rong.a.chen@intel.com> Debugged-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20190709125744.GB3402@hirez.programming.kicks-ass.net
-
- 09 7月, 2019 4 次提交
-
-
由 Jiri Slaby 提交于
common_spurious is currently ENDed erroneously. common_interrupt is used in its ENDPROC. So fix this mistake. Found by my asm macros rewrite patchset. Fixes: f8a8fe61 ("x86/irq: Seperate unused system vectors from spurious entry again") Signed-off-by: NJiri Slaby <jslaby@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190709063402.19847-1-jslaby@suse.cz
-
由 Ross Zwisler 提交于
This reverts commit 392bef70. Per the discussion here: https://lkml.kernel.org/r/201906201042.3BF5CD6@keescook the above referenced commit breaks kernel compilation with old GCC toolchains as well as current versions of the Gold linker. Revert it to fix the regression and to keep the ability to compile the kernel with these tools. Signed-off-by: NRoss Zwisler <zwisler@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NGuenter Roeck <groeck@chromium.org> Cc: <stable@vger.kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Kees Cook <keescook@chromium.org> Cc: Johannes Hirte <johannes.hirte@datenkhaos.de> Cc: Klaus Kusche <klaus.kusche@computerix.info> Cc: samitolvanen@google.com Cc: Guenter Roeck <groeck@google.com> Link: https://lkml.kernel.org/r/20190701155208.211815-1-zwisler@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
The mutex mm->context->lock for init_mm is not initialized for init_mm. This wasn't a problem because it remained unused. This changed however since commit 4fc19708 ("x86/alternatives: Initialize temporary mm for patching") Initialize the mutex for init_mm. Fixes: 4fc19708 ("x86/alternatives: Initialize temporary mm for patching") Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NNadav Amit <namit@vmware.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20190701173354.2pe62hhliok2afea@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Michael Kelley 提交于
Break out parts of mshyperv.h that are ISA independent into a separate file in include/asm-generic. This move facilitates ARM64 code reusing these definitions and avoids code duplication. No functionality or behavior is changed. Signed-off-by: NMichael Kelley <mikelley@microsoft.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 07 7月, 2019 2 次提交
-
-
All fpu__xstate_clear_all_cpu_caps() does is to invoke one simple function since commit 73e3a7d2 ("x86/fpu: Remove the explicit clearing of XSAVE dependent features") so invoke that function directly and remove the wrapper. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190704060743.rvew4yrjd6n33uzx@linutronix.de
-
The command line option `no387' is designed to disable the FPU entirely. This only 'works' with CONFIG_MATH_EMULATION enabled. But on 64bit this cannot work because user space expects SSE to work which required basic FPU support. MATH_EMULATION does not help because SSE is not emulated. The command line option `nofxsr' should also be limited to 32bit because FXSR is part of the required flags on 64bit so turning it off is not possible. Clearing X86_FEATURE_FPU without emulation enabled will not work anyway and hang in fpu__init_system_early_generic() before the console is enabled. Setting additioal dependencies, ensures that the CPU still boots on a modern CPU. Otherwise, dropping FPU will leave FXSR enabled causing the kernel to crash early in fpu__init_system_mxcsr(). With XSAVE support it will crash in fpu__init_cpu_xstate(). The problem is that xsetbv() with XMM set and SSE cleared is not allowed. That means XSAVE has to be disabled. The XSAVE support is disabled in fpu__init_system_xstate_size_legacy() but it is too late. It can be removed, it has been added in commit 1f999ab5 ("x86, xsave: Disable xsave in i387 emulation mode") to use `no387' on a CPU with XSAVE support. All this happens before console output. After hat, the next possible crash is in RAID6 detect code because MMX remained enabled. With a 3DNOW enabled config it will explode in memcpy() for instance due to kernel_fpu_begin() but this is unconditionally enabled. This is enough to boot a Debian Wheezy on a 32bit qemu "host" CPU which supports everything up to XSAVES, AVX2 without 3DNOW. Later, Debian increased the minimum requirements to i686 which means it does not boot userland atleast due to CMOV. After masking the additional features it still keeps SSE4A and 3DNOW* enabled (if present on the host) but those are unused in the kernel. Restrict `no387' and `nofxsr' otions to 32bit only. Add dependencies for FPU, FXSR to additionaly mask CMOV, MMX, XSAVE if FXSR or FPU is cleared. Reported-by: NVegard Nossum <vegard.nossum@oracle.com> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190703083247.57kjrmlxkai3vpw3@linutronix.de
-
- 06 7月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
Retry tune per-vCPU timer_advance_ns if adaptive tuning goes insane which can happen sporadically in product environment. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 7月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
Replace a magic 64-bit mask with a list of valid registers, computing the same mask in the end. Suggested-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-