- 08 9月, 2018 4 次提交
-
-
由 Nadav Amit 提交于
When page-table entries are set, the compiler might optimize their assignment by using multiple instructions to set the PTE. This might turn into a security hazard if the user somehow manages to use the interim PTE. L1TF does not make our lives easier, making even an interim non-present PTE a security hazard. Using WRITE_ONCE() to set PTEs and friends should prevent this potential security hazard. I skimmed the differences in the binary with and without this patch. The differences are (obviously) greater when CONFIG_PARAVIRT=n as more code optimizations are possible. For better and worse, the impact on the binary with this patch is pretty small. Skimming the code did not cause anything to jump out as a security hazard, but it seems that at least move_soft_dirty_pte() caused set_pte_at() to use multiple writes. Signed-off-by: NNadav Amit <namit@vmware.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180902181451.80520-1-namit@vmware.com
-
由 Thomas Gleixner 提交于
activate_managed() returns EINVAL instead of -EINVAL in case of error. While this is unlikely to happen, the positive return value would cause further malfunction at the call site. Fixes: 2db1f959 ("x86/vector: Handle managed interrupts proper") Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
-
由 Wanpeng Li 提交于
Dan Carpenter reported that the untrusted data returns from kvm_register_read() results in the following static checker warning: arch/x86/kvm/lapic.c:576 kvm_pv_send_ipi() error: buffer underflow 'map->phys_map' 's32min-s32max' KVM guest can easily trigger this by executing the following assembly sequence in Ring0: mov $10, %rax mov $0xFFFFFFFF, %rbx mov $0xFFFFFFFF, %rdx mov $0, %rsi vmcall As this will cause KVM to execute the following code-path: vmx_handle_exit() -> handle_vmcall() -> kvm_emulate_hypercall() -> kvm_pv_send_ipi() which will reach out-of-bounds access. This patch fixes it by adding a check to kvm_pv_send_ipi() against map->max_apic_id, ignoring destinations that are not present and delivering the rest. We also check whether or not map->phys_map[min + i] is NULL since the max_apic_id is set to the max apic id, some phys_map maybe NULL when apic id is sparse, especially kvm unconditionally set max_apic_id to 255 to reserve enough space for any xAPIC ID. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Liran Alon <liran.alon@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> [Add second "if (min > map->max_apic_id)" to complete the fix. -Radim] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Liran Alon 提交于
Consider the case L1 had a IRQ/NMI event until it executed VMLAUNCH/VMRESUME which wasn't delivered because it was disallowed (e.g. interrupts disabled). When L1 executes VMLAUNCH/VMRESUME, L0 needs to evaluate if this pending event should cause an exit from L2 to L1 or delivered directly to L2 (e.g. In case L1 don't intercept EXTERNAL_INTERRUPT). Usually this would be handled by L0 requesting a IRQ/NMI window by setting VMCS accordingly. However, this setting was done on VMCS01 and now VMCS02 is active instead. Thus, when L1 executes VMLAUNCH/VMRESUME we force L0 to perform pending event evaluation by requesting a KVM_REQ_EVENT. Note that above scenario exists when L1 KVM is about to enter L2 but requests an "immediate-exit". As in this case, L1 will disable-interrupts and then send a self-IPI before entering L2. Reviewed-by: NNikita Leshchenko <nikita.leshchenko@oracle.com> Co-developed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 07 9月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 06 9月, 2018 2 次提交
-
-
由 Jann Horn 提交于
When the kernel.print-fatal-signals sysctl has been enabled, a simple userspace crash will cause the kernel to write a crash dump that contains, among other things, the kernel gsbase into dmesg. As suggested by Andy, limit output to pt_regs, FS_BASE and KERNEL_GS_BASE in this case. This also moves the bitness-specific logic from show_regs() into process_{32,64}.c. Fixes: 45807a1d ("vdso: print fatal signals") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bpetkov@suse.de> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180831194151.123586-1-jannh@google.com
-
由 Chuanhua Lei 提交于
Loops per jiffy is calculated by multiplying tsc_khz with 1e3 and then dividing it by HZ. Both tsc_khz and the temporary variable holding the multiplication result are of type unsigned long, so on 32bit the result is truncated to the lower 32bit. Use u64 as type for the temporary variable and cast tsc_khz to it before multiplying. [ tglx: Massaged changelog and removed pointless braces ] Fixes: cf7a63ef ("x86/tsc: Calibrate tsc only once") Signed-off-by: NChuanhua Lei <chuanhua.lei@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: yixin.zhu@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Pavel Tatashin <pasha.tatashin@microsoft.com> Cc: Rajvi Jingar <rajvi.jingar@intel.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/1536228203-18701-1-git-send-email-chuanhua.lei@linux.intel.com
-
- 03 9月, 2018 1 次提交
-
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in arch/x86/include/asm/atomic.h that are caused by having a #define macro between the kernel-doc notation and the function name. Fixed by moving the #define macro to after the function implementation. Make the same change for atomic64_{32,64}.h for consistency even though there were no kernel-doc warnings found in these header files, but there would be if they were used in generation of documentation. Fixes these kernel-doc warnings: ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'i' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:84: warning: Excess function parameter 'v' description in 'arch_atomic_sub_and_test' ../arch/x86/include/asm/atomic.h:96: warning: Excess function parameter 'v' description in 'arch_atomic_inc' ../arch/x86/include/asm/atomic.h:109: warning: Excess function parameter 'v' description in 'arch_atomic_dec' ../arch/x86/include/asm/atomic.h:124: warning: Excess function parameter 'v' description in 'arch_atomic_dec_and_test' ../arch/x86/include/asm/atomic.h:138: warning: Excess function parameter 'v' description in 'arch_atomic_inc_and_test' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'i' description in 'arch_atomic_add_negative' ../arch/x86/include/asm/atomic.h:153: warning: Excess function parameter 'v' description in 'arch_atomic_add_negative' Fixes: 18cc1814 ("atomics/treewide: Make test ops optional") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Link: https://lkml.kernel.org/r/0a1e678d-c8c5-b32c-2640-ed4e94d399d2@infradead.org
-
- 02 9月, 2018 4 次提交
-
-
由 Filippo Sironi 提交于
Handle the case where microcode gets loaded on the BSP's hyperthread sibling first and the boot_cpu_data's microcode revision doesn't get updated because of early exit due to the siblings sharing a microcode engine. For that, simply write the updated revision on all CPUs unconditionally. Signed-off-by: NFilippo Sironi <sironi@amazon.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: prarit@redhat.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1533050970-14385-1-git-send-email-sironi@amazon.de
-
由 Prarit Bhargava 提交于
When preparing an MCE record for logging, boot_cpu_data.microcode is used to read out the microcode revision on the box. However, on systems where late microcode update has happened, the microcode revision output in a MCE log record is wrong because boot_cpu_data.microcode is not updated when the microcode gets updated. But, the microcode revision saved in boot_cpu_data's microcode member should be kept up-to-date, regardless, for consistency. Make it so. Fixes: fa94d0c6 ("x86/MCE: Save microcode revision in machine check records") Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: sironi@amazon.de Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20180731112739.32338-1-prarit@redhat.com
-
由 Randy Dunlap 提交于
Fix the section mismatch warning in arch/x86/mm/pti.c: WARNING: vmlinux.o(.text+0x6972a): Section mismatch in reference from the function pti_clone_pgtable() to the function .init.text:pti_user_pagetable_walk_pte() The function pti_clone_pgtable() references the function __init pti_user_pagetable_walk_pte(). This is often because pti_clone_pgtable lacks a __init annotation or the annotation of pti_user_pagetable_walk_pte is wrong. FATAL: modpost: Section mismatches detected. Fixes: 85900ea5 ("x86/pti: Map the vsyscall page if needed") Reported-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/43a6d6a3-d69d-5eda-da09-0b1c88215a2a@infradead.org
-
由 Samuel Neves 提交于
In the __getcpu function, lsl is using the wrong target and destination registers. Luckily, the compiler tends to choose %eax for both variables, so it has been working so far. Fixes: a582c540 ("x86/vdso: Use RDPID in preference to LSL when available") Signed-off-by: NSamuel Neves <sneves@dei.uc.pt> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180901201452.27828-1-sneves@dei.uc.pt
-
- 01 9月, 2018 1 次提交
-
-
由 LuckTony 提交于
The trick with flipping bit 63 to avoid loading the address of the 1:1 mapping of the poisoned page while the 1:1 map is updated used to work when unmapping the page. But it falls down horribly when attempting to directly set the page as uncacheable. The problem is that when the cache mode is changed to uncachable, the pages needs to be flushed from the cache first. But the decoy address is non-canonical due to bit 63 flipped, and the CLFLUSH instruction throws a #GP fault. Add code to change_page_attr_set_clr() to fix the address before calling flush. Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@alien8.de> Cc: linux-edac <linux-edac@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Link: https://lkml.kernel.org/r/20180831165506.GA9605@agluck-desk
-
- 31 8月, 2018 4 次提交
-
-
由 Joerg Roedel 提交于
When PTI is enabled on x86-32 the kernel uses the GDT mapped in the fixmap for the simple reason that this address is also mapped for user-space. The efi_call_phys_prolog()/efi_call_phys_epilog() wrappers change the GDT to call EFI runtime services and switch back to the kernel GDT when they return. But the switch-back uses the writable GDT, not the fixmap GDT. When that happened and and the CPU returns to user-space it switches to the user %cr3 and tries to restore user segment registers. This fails because the writable GDT is not mapped in the user page-table, and without a GDT the fault handlers also can't be launched. The result is a triple fault and reboot of the machine. Fix that by restoring the GDT back to the fixmap GDT which is also mapped in the user page-table. Fixes: 7757d607 x86/pti: ('Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32') Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NGuenter Roeck <linux@roeck-us.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: hpa@zytor.com Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/1535702738-10971-1-git-send-email-joro@8bytes.org
-
由 Andy Lutomirski 提交于
A NMI can hit in the middle of context switching or in the middle of switch_mm_irqs_off(). In either case, CR3 might not match current->mm, which could cause copy_from_user_nmi() and friends to read the wrong memory. Fix it by adding a new nmi_uaccess_okay() helper and checking it in copy_from_user_nmi() and in __copy_from_user_nmi()'s callers. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NRik van Riel <riel@surriel.com> Cc: Nadav Amit <nadav.amit@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jann Horn <jannh@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/dd956eba16646fd0b15c3c0741269dfd84452dac.1535557289.git.luto@kernel.org
-
由 Ben Hutchings 提交于
When bootstrapping an architecture, it's usual to generate the kernel's user-space headers (make headers_install) before building a compiler. Move the compiler check (for asm goto support) to the archprepare target so that it is only done when building code for the target. Fixes: e501ce95 ("x86: Force asm-goto") Reported-by: NHelmut Grohne <helmutg@debian.org> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180829194317.GA4765@decadent.org.uk
-
由 Jann Horn 提交于
show_opcodes() is used both for dumping kernel instructions and for dumping user instructions. If userspace causes #PF by jumping to a kernel address, show_opcodes() can be reached with regs->ip controlled by the user, pointing to kernel code. Make sure that userspace can't trick us into dumping kernel memory into dmesg. Fixes: 7cccf072 ("x86/dumpstack: Add a show_ip() function") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: security@kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
-
- 30 8月, 2018 15 次提交
-
-
由 Sean Christopherson 提交于
Allowing x86_emulate_instruction() to be called directly has led to subtle bugs being introduced, e.g. not setting EMULTYPE_NO_REEXECUTE in the emulation type. While most of the blame lies on re-execute being opt-out, exporting x86_emulate_instruction() also exposes its cr2 parameter, which may have contributed to commit d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") using x86_emulate_instruction() instead of emulate_instruction() because "hey, I have a cr2!", which in turn introduced its EMULTYPE_NO_REEXECUTE bug. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Lack of the kvm_ prefix gives the impression that it's a VMX or SVM specific function, and there's no conflict that prevents adding the kvm_ prefix. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Commit a6f177ef ("KVM: Reenter guest after emulation failure if due to access to non-mmio address") added reexecute_instruction() to handle the scenario where two (or more) vCPUS race to write a shadowed page, i.e. reexecute_instruction() is intended to return true if and only if the instruction being emulated was accessing a shadowed page. As L0 is only explicitly shadowing L1 tables, an emulation failure of a nested VM instruction cannot be due to a race to write a shadowed page and so should never be re-executed. This fixes an issue where an "MMIO" emulation failure[1] in L2 is all but guaranteed to result in an infinite loop when TDP is enabled. Because "cr2" is actually an L2 GPA when TDP is enabled, calling kvm_mmu_gva_to_gpa_write() to translate cr2 in the non-direct mapped case (L2 is never direct mapped) will almost always yield UNMAPPED_GVA and cause reexecute_instruction() to immediately return true. The !mmio_info_in_cache() check in kvm_mmu_page_fault() doesn't catch this case because mmio_info_in_cache() returns false for a nested MMU (the MMIO caching currently handles L1 only, e.g. to cache nested guests' GPAs we'd have to manually flush the cache when switching between VMs and when L1 updated its page tables controlling the nested guest). Way back when, commit 68be0803 ("KVM: x86: never re-execute instruction with enabled tdp") changed reexecute_instruction() to always return false when using TDP under the assumption that KVM would only get into the emulator for MMIO. Commit 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") effectively reverted that behavior in order to handle the scenario where emulation failed due to an access from L1 to the shadow page tables for L2, but it didn't account for the case where emulation failed in L2 with TDP enabled. All of the above logic also applies to retry_instruction(), added by commit 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions"). An indefinite loop in retry_instruction() should be impossible as it protects against retrying the same instruction over and over, but it's still correct to not retry an L2 instruction in the first place. Fix the immediate issue by adding a check for a nested guest when determining whether or not to allow retry in kvm_mmu_page_fault(). In addition to fixing the immediate bug, add WARN_ON_ONCE in the retry functions since they are not designed to handle nested cases, i.e. they need to be modified even if there is some scenario in the future where we want to allow retrying a nested guest. [1] This issue was encountered after commit 3a2936de ("kvm: mmu: Don't expose private memslots to L2") changed the page fault path to return KVM_PFN_NOSLOT when translating an L2 access to a prive memslot. Returning KVM_PFN_NOSLOT is semantically correct when we want to hide a memslot from L2, i.e. there effectively is no defined memory region for L2, but it has the unfortunate side effect of making KVM think the GFN is a MMIO page, thus triggering emulation. The failure occurred with in-development code that deliberately exposed a private memslot to L2, which L2 accessed with an instruction that is not emulated by KVM. Fixes: 95b3cf69 ("KVM: x86: let reexecute_instruction work for tdp") Fixes: 1cb3f3ae ("KVM: x86: retry non-page-table writing instructions") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: Jim Mattson <jmattson@google.com> Cc: Krish Sadhukhan <krish.sadhukhan@oracle.com> Cc: Xiao Guangrong <xiaoguangrong@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Effectively force kvm_mmu_page_fault() to opt-in to allowing retry to make it more obvious when and why it allows emulation to be retried. Previously this approach was less convenient due to retry and re-execute behavior being controlled by separate flags that were also inverted in their implementations (opt-in versus opt-out). Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
retry_instruction() and reexecute_instruction() are a package deal, i.e. there is no scenario where one is allowed and the other is not. Merge their controlling emulation type flags to enforce this in code. Name the combined flag EMULTYPE_ALLOW_RETRY to make it abundantly clear that we are allowing re{try,execute} to occur, as opposed to explicitly requesting retry of a previously failed instruction. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution of an instruction after emulation decode failure is intended to be used only when emulating shadow page accesses. Invert the flag to make allowing re-execution opt-in since that behavior is by far in the minority. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of RSM emulation. Add a new helper, kvm_emulate_instruction_from_buffer(), to support emulating from a pre-defined buffer. This eliminates the last direct call to x86_emulate_instruction() outside of kvm_mmu_page_fault(), which means x86_emulate_instruction() can be unexported in a future patch. Fixes: 7607b717 ("KVM: SVM: install RSM intercept") Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of MMIO emulation. As handle_ept_misconfig() is only used for MMIO emulation, it should pass EMULTYPE_NO_REEXECUTE when using the emulator to skip an instr in the fast-MMIO case where VM_EXIT_INSTRUCTION_LEN is invalid. And because the cr2 value passed to x86_emulate_instruction() is only destined for use when retrying or reexecuting, we can simply call emulate_instruction(). Fixes: d391f120 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested") Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Colin Ian King 提交于
Variable dst_vaddr_end is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'dst_vaddr_end' set but not used [-Wunused-but-set-variable] Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Vitaly Kuznetsov 提交于
nested_run_pending is set 20 lines above and check_vmentry_prereqs()/ check_vmentry_postreqs() don't seem to be resetting it (the later, however, checks it). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NEduardo Valentin <eduval@amazon.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Uros Bizjak 提交于
Replace open-coded set instructions with CC_SET()/CC_OUT(). Signed-off-by: NUros Bizjak <ubizjak@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20180814165951.13538-1-ubizjak@gmail.com
-
由 Jiri Kosina 提交于
text_poke() and text_poke_bp() must be called with text_mutex held. Put proper lockdep anotation in place instead of just mentioning the requirement in a comment. Reported-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NJiri Kosina <jkosina@suse.cz> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1808280853520.25787@cbobk.fhfr.pm
-
由 Jann Horn 提交于
Reset the KASAN shadow state of the task stack before rewinding RSP. Without this, a kernel oops will leave parts of the stack poisoned, and code running under do_exit() can trip over such poisoned regions and cause nonsensical false-positive KASAN reports about stack-out-of-bounds bugs. This does not wipe the exception stacks; if an oops happens on an exception stack, it might result in random KASAN false-positives from other tasks afterwards. This is probably relatively uninteresting, since if the kernel oopses on an exception stack, there are most likely bigger things to worry about. It'd be more interesting if vmapped stacks and KASAN were compatible, since then handle_stack_overflow() would oops from exception stack context. Fixes: 2deb4be2 ("x86/dumpstack: When OOPSing, rewind the stack before do_exit()") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: kasan-dev@googlegroups.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180828184033.93712-1-jannh@google.com
-
由 Nick Desaulniers 提交于
This should have been marked extern inline in order to pick up the out of line definition in arch/x86/kernel/irqflags.S. Fixes: 208cbb32 ("x86/irqflags: Provide a declaration for native_save_fl") Reported-by: NBen Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJuergen Gross <jgross@suse.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180827214011.55428-1-ndesaulniers@google.com
-
由 Masahiro Yamada 提交于
Commit cafa0010 ("Raise the minimum required gcc version to 4.6") bumped the minimum GCC version to 4.6 for all architectures. Remove the workaround code. It was the only user of cc-if-fullversion. Remove the macro as well. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/1535348714-25457-1-git-send-email-yamada.masahiro@socionext.com
-
- 29 8月, 2018 1 次提交
-
-
由 Colin Ian King 提交于
Variable save_pud is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'save_pud' set but not used [-Wunused-but-set-variable] Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 28 8月, 2018 1 次提交
-
-
由 Juergen Gross 提交于
Using only 32-bit writes for the pte will result in an intermediate L1TF vulnerable PTE. When running as a Xen PV guest this will at once switch the guest to shadow mode resulting in a loss of performance. Use arch_atomic64_xchg() instead which will perform the requested operation atomically with all 64 bits. Some performance considerations according to: https://software.intel.com/sites/default/files/managed/ad/dc/Intel-Xeon-Scalable-Processor-throughput-latency.pdf The main number should be the latency, as there is no tight loop around native_ptep_get_and_clear(). "lock cmpxchg8b" has a latency of 20 cycles, while "lock xchg" (with a memory operand) isn't mentioned in that document. "lock xadd" (with xadd having 3 cycles less latency than xchg) has a latency of 11, so we can assume a latency of 14 for "lock xchg". Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NJan Beulich <jbeulich@suse.com> Tested-by: NJason Andryuk <jandryuk@gmail.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 27 8月, 2018 4 次提交
-
-
由 Juergen Gross 提交于
In some cases 32-bit PAE PV guests still write PTEs directly instead of using hypercalls. This is especially bad when clearing a PTE as this is done via 32-bit writes which will produce intermediate L1TF attackable PTEs. Change the code to use hypercalls instead. Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Nikolas Nyby 提交于
Fix a typo in the Kconfig help text: adverticed -> advertised. Signed-off-by: NNikolas Nyby <nikolas@gnu.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: trivial@kernel.org Cc: tglx@linutronix.de Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20180825231054.23813-1-nikolas@gnu.org
-
由 Andi Kleen 提交于
On Nehalem and newer core CPUs the CPU cache internally uses 44 bits physical address space. The L1TF workaround is limited by this internal cache address width, and needs to have one bit free there for the mitigation to work. Older client systems report only 36bit physical address space so the range check decides that L1TF is not mitigated for a 36bit phys/32GB system with some memory holes. But since these actually have the larger internal cache width this warning is bogus because it would only really be needed if the system had more than 43bits of memory. Add a new internal x86_cache_bits field. Normally it is the same as the physical bits field reported by CPUID, but for Nehalem and newerforce it to be at least 44bits. Change the L1TF memory size warning to use the new cache_bits field to avoid bogus warnings and remove the bogus comment about memory size. Fixes: 17dbca11 ("x86/speculation/l1tf: Add sysfs reporting for l1tf") Reported-by: NGeorge Anchev <studio@anchev.net> Reported-by: NChristopher Snowhill <kode54@gmail.com> Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: Michael Hocko <mhocko@suse.com> Cc: vbabka@suse.cz Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-1-andi@firstfloor.org
-
由 Andi Kleen 提交于
The check for Spectre microcodes does not check for family 6, only the model numbers. Add a family 6 check to avoid ambiguity with other families. Fixes: a5b29663 ("x86/cpufeature: Blacklist SPEC_CTRL/PRED_CMD on early Spectre v2 microcodes") Signed-off-by: NAndi Kleen <ak@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180824170351.34874-2-andi@firstfloor.org
-
- 25 8月, 2018 1 次提交
-
-
由 Dave Watson 提交于
A regression was reported bisecting to 1476db2d "Move HashKey computation from stack to gcm_context". That diff moved HashKey computation from the stack, which was explicitly aligned in the asm, to a struct provided from the C code, depending on AESNI_ALIGN_ATTR for alignment. It appears some compilers may not align this struct correctly, resulting in a crash on the movdqa instruction when attempting to encrypt or decrypt data. Fix by using unaligned loads for the HashKeys. On modern hardware there is no perf difference between the unaligned and aligned loads. All other accesses to gcm_context_data already use unaligned loads. Reported-by: NMauro Rossi <issor.oruam@gmail.com> Fixes: 1476db2d ("Move HashKey computation from stack to gcm_context") Cc: <stable@vger.kernel.org> Signed-off-by: NDave Watson <davejwatson@fb.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 24 8月, 2018 1 次提交
-
-
由 Vlastimil Babka 提交于
Two users have reported [1] that they have an "extremely unlikely" system with more than MAX_PA/2 memory and L1TF mitigation is not effective. Make the warning more helpful by suggesting the proper mem=X kernel boot parameter to make it effective and a link to the L1TF document to help decide if the mitigation is worth the unusable RAM. [1] https://bugzilla.suse.com/show_bug.cgi?id=1105536Suggested-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/966571f0-9d7f-43dc-92c6-a10eec7a1254@suse.cz
-