- 23 5月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The X86_FEATURE_SSBD is an synthetic CPU feature - that is it bit location has no relevance to the real CPUID 0x7.EBX[31] bit position. For that we need the new CPU feature name. Fixes: 52817587 ("x86/cpufeatures: Disentangle SSBD enumeration") Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: stable@vger.kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://lkml.kernel.org/r/20180521215449.26423-2-konrad.wilk@oracle.com
-
- 19 5月, 2018 2 次提交
-
-
由 Borislav Petkov 提交于
... into a global, two-dimensional array and service subsequent reads from that cache to avoid rdmsr_on_cpu() calls during CPU hotplug (IPIs with IRQs disabled). In addition, this fixes a KASAN slab-out-of-bounds read due to wrong usage of the bank->blocks pointer. Fixes: 27bd5950 ("x86/mce/AMD: Get address from already initialized block") Reported-by: NJohannes Hirte <johannes.hirte@datenkhaos.de> Tested-by: NJohannes Hirte <johannes.hirte@datenkhaos.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com> Link: http://lkml.kernel.org/r/20180414004230.GA2033@probook
-
由 Dmitry Safonov 提交于
The x86 mmap() code selects the mmap base for an allocation depending on the bitness of the syscall. For 64bit sycalls it select mm->mmap_base and for 32bit mm->mmap_compat_base. exec() calls mmap() which in turn uses in_compat_syscall() to check whether the mapping is for a 32bit or a 64bit task. The decision is made on the following criteria: ia32 child->thread.status & TS_COMPAT x32 child->pt_regs.orig_ax & __X32_SYSCALL_BIT ia64 !ia32 && !x32 __set_personality_x32() was dropping TS_COMPAT flag, but set_personality_64bit() has kept compat syscall flag making in_compat_syscall() return true during the first exec() syscall. Which in result has user-visible effects, mentioned by Alexey: 1) It breaks ASAN $ gcc -fsanitize=address wrap.c -o wrap-asan $ ./wrap32 ./wrap-asan true ==1217==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING. ==1217==ASan shadow was supposed to be located in the [0x00007fff7000-0x10007fff7fff] range. ==1217==Process memory map follows: 0x000000400000-0x000000401000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000600000-0x000000601000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x000000601000-0x000000602000 /home/izbyshev/test/gcc/asan-exec-from-32bit/wrap-asan 0x0000f7dbd000-0x0000f7de2000 /lib64/ld-2.27.so 0x0000f7fe2000-0x0000f7fe3000 /lib64/ld-2.27.so 0x0000f7fe3000-0x0000f7fe4000 /lib64/ld-2.27.so 0x0000f7fe4000-0x0000f7fe5000 0x7fed9abff000-0x7fed9af54000 0x7fed9af54000-0x7fed9af6b000 /lib64/libgcc_s.so.1 [snip] 2) It doesn't seem to be great for security if an attacker always knows that ld.so is going to be mapped into the first 4GB in this case (the same thing happens for PIEs as well). The testcase: $ cat wrap.c int main(int argc, char *argv[]) { execvp(argv[1], &argv[1]); return 127; } $ gcc wrap.c -o wrap $ LD_SHOW_AUXV=1 ./wrap ./wrap true |& grep AT_BASE AT_BASE: 0x7f63b8309000 AT_BASE: 0x7faec143c000 AT_BASE: 0x7fbdb25fa000 $ gcc -m32 wrap.c -o wrap32 $ LD_SHOW_AUXV=1 ./wrap32 ./wrap true |& grep AT_BASE AT_BASE: 0xf7eff000 AT_BASE: 0xf7cee000 AT_BASE: 0x7f8b9774e000 Fixes: 1b028f78 ("x86/mm: Introduce mmap_compat_base() for 32-bit mmap()") Fixes: ada26481 ("x86/mm: Make in_compat_syscall() work during exec") Reported-by: NAlexey Izbyshev <izbyshev@ispras.ru> Bisected-by: NAlexander Monakov <amonakov@ispras.ru> Investigated-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NDmitry Safonov <dima@arista.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org> Cc: Borislav Petkov <bp@suse.de> Cc: Alexander Monakov <amonakov@ispras.ru> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: linux-mm@kvack.org Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180517233510.24996-1-dima@arista.com
-
- 18 5月, 2018 3 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The "336996 Speculative Execution Side Channel Mitigations" from May defines this as SSB_NO, hence lets sync-up. Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Rick bisected a regression on large systems which use the x2apic cluster mode for interrupt delivery to the commit wich reworked the cluster management. The problem is caused by a missing initialization of the clusterid field in the shared cluster data structures. So all structures end up with cluster ID 0 which only allows sharing between all CPUs which belong to cluster 0. All other CPUs with a cluster ID > 0 cannot share the data structure because they cannot find existing data with their cluster ID. This causes malfunction with IPIs because IPIs are sent to the wrong cluster and the caller waits for ever that the target CPU handles the IPI. Add the missing initialization when a upcoming CPU is the first in a cluster so that the later booting CPUs can find the data and share it for proper operation. Fixes: 023a6117 ("x86/apic/x2apic: Simplify cluster management") Reported-by: NRick Warner <rick@microway.com> Bisected-by: NRick Warner <rick@microway.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NRick Warner <rick@microway.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1805171418210.1947@nanos.tec.linutronix.de
-
由 Michael S. Tsirkin 提交于
KVM_HINTS_DEDICATED seems to be somewhat confusing: Guest doesn't really care whether it's the only task running on a host CPU as long as it's not preempted. And there are more reasons for Guest to be preempted than host CPU sharing, for example, with memory overcommit it can get preempted on a memory access, post copy migration can cause preemption, etc. Let's call it KVM_HINTS_REALTIME which seems to better match what guests expect. Also, the flag most be set on all vCPUs - current guests assume this. Note so in the documentation. Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 5月, 2018 15 次提交
-
-
由 Tom Lendacky 提交于
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Add the necessary logic for supporting the emulated VIRT_SPEC_CTRL MSR to x86_virt_spec_ctrl(). If either X86_FEATURE_LS_CFG_SSBD or X86_FEATURE_VIRT_SPEC_CTRL is set then use the new guest_virt_spec_ctrl argument to check whether the state must be modified on the host. The update reuses speculative_store_bypass_update() so the ZEN-specific sibling coordination can be reused. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
x86_spec_ctrL_mask is intended to mask out bits from a MSR_SPEC_CTRL value which are not to be modified. However the implementation is not really used and the bitmask was inverted to make a check easier, which was removed in "x86/bugs: Remove x86_spec_ctrl_set()" Aside of that it is missing the STIBP bit if it is supported by the platform, so if the mask would be used in x86_virt_spec_ctrl() then it would prevent a guest from setting STIBP. Add the STIBP bit if supported and use the mask in x86_virt_spec_ctrl() to sanitize the value which is supplied by the guest. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Thomas Gleixner 提交于
x86_spec_ctrl_set() is only used in bugs.c and the extra mask checks there provide no real value as both call sites can just write x86_spec_ctrl_base to MSR_SPEC_CTRL. x86_spec_ctrl_base is valid and does not need any extra masking or checking. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
x86_spec_ctrl_base is the system wide default value for the SPEC_CTRL MSR. x86_spec_ctrl_get_default() returns x86_spec_ctrl_base and was intended to prevent modification to that variable. Though the variable is read only after init and globaly visible already. Remove the function and export the variable instead. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Borislav Petkov 提交于
Function bodies are very similar and are going to grow more almost identical code. Add a bool arg to determine whether SPEC_CTRL is being set for the guest or restored to the host. No functional changes. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The upcoming support for the virtual SPEC_CTRL MSR on AMD needs to reuse speculative_store_bypass_update() to avoid code duplication. Add an argument for supplying a thread info (TIF) value and create a wrapper speculative_store_bypass_update_current() which is used at the existing call site. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Tom Lendacky 提交于
Some AMD processors only support a non-architectural means of enabling speculative store bypass disable (SSBD). To allow a simplified view of this to a guest, an architectural definition has been created through a new CPUID bit, 0x80000008_EBX[25], and a new MSR, 0xc001011f. With this, a hypervisor can virtualize the existence of this definition and provide an architectural method for using SSBD to a guest. Add the new CPUID feature, the new MSR and update the existing SSBD support to use this MSR when present. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Thomas Gleixner 提交于
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The AMD64_LS_CFG MSR is a per core MSR on Family 17H CPUs. That means when hyperthreading is enabled the SSBD bit toggle needs to take both cores into account. Otherwise the following situation can happen: CPU0 CPU1 disable SSB disable SSB enable SSB <- Enables it for the Core, i.e. for CPU0 as well So after the SSB enable on CPU1 the task on CPU0 runs with SSB enabled again. On Intel the SSBD control is per core as well, but the synchronization logic is implemented behind the per thread SPEC_CTRL MSR. It works like this: CORE_SPEC_CTRL = THREAD0_SPEC_CTRL | THREAD1_SPEC_CTRL i.e. if one of the threads enables a mitigation then this affects both and the mitigation is only disabled in the core when both threads disabled it. Add the necessary synchronization logic for AMD family 17H. Unfortunately that requires a spinlock to serialize the access to the MSR, but the locks are only shared between siblings. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
Add a ZEN feature bit so family-dependent static_cpu_has() optimizations can be built for ZEN. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The SSBD enumeration is similarly to the other bits magically shared between Intel and AMD though the mechanisms are different. Make X86_FEATURE_SSBD synthetic and set it depending on the vendor specific features or family dependent setup. Change the Intel bit to X86_FEATURE_SPEC_CTRL_SSBD to denote that SSBD is controlled via MSR_SPEC_CTRL and fix up the usage sites. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Thomas Gleixner 提交于
The availability of the SPEC_CTRL MSR is enumerated by a CPUID bit on Intel and implied by IBRS or STIBP support on AMD. That's just confusing and in case an AMD CPU has IBRS not supported because the underlying problem has been fixed but has another bit valid in the SPEC_CTRL MSR, the thing falls apart. Add a synthetic feature bit X86_FEATURE_MSR_SPEC_CTRL to denote the availability on both Intel and AMD. While at it replace the boot_cpu_has() checks with static_cpu_has() where possible. This prevents late microcode loading from exposing SPEC_CTRL, but late loading is already very limited as it does not reevaluate the mitigation options and other bits and pieces. Having static_cpu_has() is the simplest and least fragile solution. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Borislav Petkov 提交于
Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NJörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
-
由 Thomas Gleixner 提交于
svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bf x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 5月, 2018 2 次提交
-
-
由 Kirill A. Shutemov 提交于
cleanup_trampoline() relocates the top-level page table out of trampoline memory. We use 'top_pgtable' as our new top-level page table. But if the 'top_pgtable' would be referenced from C in a usual way, the address of the table will be calculated relative to RIP. After kernel gets relocated, the address will be in the middle of decompression buffer and the page table may get overwritten. This leads to a crash. We calculate the address of other page tables relative to the relocation address. It makes them safe. We should do the same for 'top_pgtable'. Calculate the address of 'top_pgtable' in assembly and pass down to cleanup_trampoline(). Move the page table to .pgtable section where the rest of page tables are. The section is @nobits so we save 4k in kernel image. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: e9d0e633 ("x86/boot/compressed/64: Prepare new top-level page table for trampoline") Link: http://lkml.kernel.org/r/20180516080131.27913-3-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill A. Shutemov 提交于
Eric and Hugh have reported instant reboot due to my recent changes in decompression code. The root cause is that I didn't realize that we need to adjust GOT to be able to run C code that early. The problem is only visible with an older toolchain. Binutils >= 2.24 is able to eliminate GOT references by replacing them with RIP-relative address loads: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commitdiff;h=80d873266dec We need to adjust GOT two times: - before calling paging_prepare() using the initial load address - before calling C code from the relocated kernel Reported-by: NEric Dumazet <eric.dumazet@gmail.com> Reported-by: NHugh Dickins <hughd@google.com> Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Fixes: 194a9749 ("x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G") Link: http://lkml.kernel.org/r/20180516080131.27913-2-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 15 5月, 2018 2 次提交
-
-
由 Wanpeng Li 提交于
Anthoine reported: The period used by Windows change over time but it can be 1 milliseconds or less. I saw the limit_periodic_timer_frequency print so 500 microseconds is sometimes reached. As suggested by Paolo, lower the default timer frequency limit to a smaller interval of 200 us (5000 Hz) to leave some headroom. This is required due to Windows 10 changing the scheduler tick limit from 1024 Hz to 2048 Hz. Reported-by: NAnthoine Bourgeois <anthoine.bourgeois@blade-group.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NDarren Kenny <darren.kenny@oracle.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Anthoine Bourgeois <anthoine.bourgeois@blade-group.com> Cc: Darren Kenny <darren.kenny@oracle.com> Cc: Jan Kiszka <jan.kiszka@web.de> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Steven Rostedt (VMware) 提交于
Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering. Worse yet, the hack used was: __array(char, x, 0) Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace. Nuke the trace events! Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 95a7d768 ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 14 5月, 2018 10 次提交
-
-
由 Dave Hansen 提交于
mm_pkey_is_allocated() treats pkey 0 as unallocated. That is inconsistent with the manpages, and also inconsistent with mm->context.pkey_allocation_map. Stop special casing it and only disallow values that are actually bad (< 0). The end-user visible effect of this is that you can now use mprotect_pkey() to set pkey=0. This is a bit nicer than what Ram proposed[1] because it is simpler and removes special-casing for pkey 0. On the other hand, it does allow applications to pkey_free() pkey-0, but that's just a silly thing to do, so we are not going to protect against it. The scenario that could happen is similar to what happens if you free any other pkey that is in use: it might get reallocated later and used to protect some other data. The most likely scenario is that pkey-0 comes back from pkey_alloc(), an access-disable or write-disable bit is set in PKRU for it, and the next stack access will SIGSEGV. It's not horribly different from if you mprotect()'d your stack or heap to be unreadable or unwritable, which is generally very foolish, but also not explicitly prevented by the kernel. 1. http://lkml.kernel.org/r/1522112702-27853-1-git-send-email-linuxram@us.ibm.comSigned-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org>p Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 58ab9a08 ("x86/pkeys: Check against max pkey to avoid overflows") Link: http://lkml.kernel.org/r/20180509171358.47FD785E@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dave Hansen 提交于
I got a bug report that the following code (roughly) was causing a SIGSEGV: mprotect(ptr, size, PROT_EXEC); mprotect(ptr, size, PROT_NONE); mprotect(ptr, size, PROT_READ); *ptr = 100; The problem is hit when the mprotect(PROT_EXEC) is implicitly assigned a protection key to the VMA, and made that key ACCESS_DENY|WRITE_DENY. The PROT_NONE mprotect() failed to remove the protection key, and the PROT_NONE-> PROT_READ left the PTE usable, but the pkey still in place and left the memory inaccessible. To fix this, we ensure that we always "override" the pkee at mprotect() if the VMA does not have execute-only permissions, but the VMA has the execute-only pkey. We had a check for PROT_READ/WRITE, but it did not work for PROT_NONE. This entirely removes the PROT_* checks, which ensures that PROT_NONE now works. Reported-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Michael Ellermen <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-mm@kvack.org Cc: stable@vger.kernel.org Fixes: 62b5f7d0 ("mm/core, x86/mm/pkeys: Add execute-only protection keys support") Link: http://lkml.kernel.org/r/20180509171351.084C5A71@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexander Potapenko 提交于
Clang builds with defconfig started crashing after the following commit: fb43d6cb ("x86/mm: Do not auto-massage page protections") This was caused by introducing a new global access in __startup_64(). Code in __startup_64() can be relocated during execution, but the compiler doesn't have to generate PC-relative relocations when accessing globals from that function. Clang actually does not generate them, which leads to boot-time crashes. To work around this problem, every global pointer must be adjusted using fixup_pointer(). Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDave Hansen <dave.hansen@intel.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: dvyukov@google.com Cc: kirill.shutemov@linux.intel.com Cc: linux-mm@kvack.org Cc: md@google.com Cc: mka@chromium.org Fixes: fb43d6cb ("x86/mm: Do not auto-massage page protections") Link: http://lkml.kernel.org/r/20180509091822.191810-1-glider@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jim Mattson 提交于
Cast val and (val >> 32) to (u32), so that they fit in a general-purpose register in both 32-bit and 64-bit code. [ tglx: Made it u32 instead of uintptr_t ] Fixes: c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ard Biesheuvel 提交于
Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes. 'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes. Tested-by: NHans de Goede <hdegoede@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Alexei Starovoitov 提交于
Workaround for the sake of BPF compilation which utilizes kernel headers, but clang does not support ASM GOTO and fails the build. Fixes: d0266046 ("x86: Remove FAST_FEATURE_TESTS") Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: daniel@iogearbox.net Cc: peterz@infradead.org Cc: netdev@vger.kernel.org Cc: bp@alien8.de Cc: yhs@fb.com Cc: kernel-team@fb.com Cc: torvalds@linux-foundation.org Cc: davem@davemloft.net Link: https://lkml.kernel.org/r/20180513193222.1997938-1-ast@kernel.org
-
由 Masami Hiramatsu 提交于
Since MOV SS and POP SS instructions will delay the exceptions until the next instruction is executed, single-stepping on it by uprobes must be prohibited. uprobe already rejects probing on POP SS (0x1f), but allows probing on MOV SS (0x8e and reg == 2). This checks the target instruction and if it is MOV SS or POP SS, returns -ENOTSUPP to reject probing. Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NOleg Nesterov <oleg@redhat.com> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Yonghong Song <yhs@fb.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S . Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/152587072544.17316.5950935243917346341.stgit@devbox
-
由 Masami Hiramatsu 提交于
Since MOV SS and POP SS instructions will delay the exceptions until the next instruction is executed, single-stepping on it by kprobes must be prohibited. However, kprobes usually executes those instructions directly on trampoline buffer (a.k.a. kprobe-booster), except for the kprobes which has post_handler. Thus if kprobe user probes MOV SS with post_handler, it will do single-stepping on the MOV SS. This means it is safe that if it is used via ftrace or perf/bpf since those don't use the post_handler. Anyway, since the stack switching is a rare case, it is safer just rejecting kprobes on such instructions. Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Ricardo Neri <ricardo.neri-calderon@linux.intel.com> Cc: Francis Deslauriers <francis.deslauriers@efficios.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Yonghong Song <yhs@fb.com> Cc: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "David S . Miller" <davem@davemloft.net> Link: https://lkml.kernel.org/r/152587069574.17316.3311695234863248641.stgit@devbox
-
由 Tetsuo Handa 提交于
>From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 9 May 2018 12:12:39 +0900 Subject: [PATCH v3] x86/kexec: avoid double free_page() upon do_kexec_load() failure. syzbot is reporting crashes after memory allocation failure inside do_kexec_load() [1]. This is because free_transition_pgtable() is called by both init_transition_pgtable() and machine_kexec_cleanup() when memory allocation failed inside init_transition_pgtable(). Regarding 32bit code, machine_kexec_free_page_tables() is called by both machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory allocation failed inside machine_kexec_alloc_page_tables(). Fix this by leaving the error handling to machine_kexec_cleanup() (and optionally setting NULL after free_page()). [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40 Fixes: f5deb796 ("x86: kexec: Use one page table in x86_64 machine_kexec") Fixes: 92be3d6b ("kexec/i386: allocate page table pages dynamically") Reported-by: Nsyzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com> Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBaoquan He <bhe@redhat.com> Cc: thomas.lendacky@amd.com Cc: prudo@linux.vnet.ibm.com Cc: Huang Ying <ying.huang@intel.com> Cc: syzkaller-bugs@googlegroups.com Cc: takahiro.akashi@linaro.org Cc: H. Peter Anvin <hpa@zytor.com> Cc: akpm@linux-foundation.org Cc: dyoung@redhat.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jp
-
由 Guenter Roeck 提交于
Add Raven Ridge root bridge and data fabric PCI IDs. This is required for amd_pci_dev_to_node_id() and amd_smn_read(). Cc: stable@vger.kernel.org # v4.16+ Tested-by: NGabriel Craciunescu <nix.or.die@gmail.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
-
- 12 5月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
Fixes: 7bb4d366 ("x86/bugs: Make cpu_show_common() static") Fixes: 24f7fc83 ("x86/bugs: Provide boot parameters for the spec_store_bypass_disable mitigation") Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 11 5月, 2018 4 次提交
-
-
由 Sean Christopherson 提交于
Update SECONDARY_EXEC_DESC for UMIP emulation if and only UMIP is actually being emulated. Skipping the VMCS update eliminates unnecessary VMREAD/VMWRITE when UMIP is supported in hardware, and on platforms that don't have SECONDARY_VM_EXEC_CONTROL. The latter case resolves a bug where KVM would fill the kernel log with warnings due to failed VMWRITEs on older platforms. Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP") Cc: stable@vger.kernel.org #4.16 Reported-by: NPaolo Zeppegno <pzeppegno@gmail.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Suggested-by: NRadim KrÄmář <rkrcmar@redhat.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
If the PCIDE bit is not set in CR4, then the MSb of CR3 is a reserved bit. If the guest tries to set it, that should cause a #GP fault. So mask out the bit only when the PCIDE bit is set. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Even though the eventfd is released after the KVM SRCU grace period elapses, the conn_to_evt data structure itself is not; it uses RCU internally, instead. Fix the read-side critical section to happen under rcu_read_lock/unlock; the result is still protected by vcpu->kvm->srcu. Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marian Rotariu 提交于
The IP increment should be done after the hypercall emulation, after calling the various handlers. In this way, these handlers can accurately identify the the IP of the VMCALL if they need it. This patch keeps the same functionality for the Hyper-V handler which does not use the return code of the standard kvm_skip_emulated_instruction() call. Signed-off-by: NMarian Rotariu <mrotariu@bitdefender.com> [Hyper-V hypercalls also need kvm_skip_emulated_instruction() - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-