- 26 8月, 2021 1 次提交
-
-
由 Kim Phillips 提交于
Factor out a helper function rather than export cpu_llc_id, which is needed in order to be able to build the AMD uncore driver as a module. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NIngo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210817221048.88063-7-kim.phillips@amd.com
-
- 24 6月, 2021 1 次提交
-
-
由 Dave Hansen 提交于
PKRU is currently partly XSAVE-managed and partly not. It has space in the task XSAVE buffer and is context-switched by XSAVE/XRSTOR. However, it is switched more eagerly than FPU because there may be a need for PKRU to be up-to-date for things like copy_to/from_user() since PKRU affects user-permission memory accesses, not just accesses from userspace itself. This leaves PKRU in a very odd position. XSAVE brings very little value to the table for how Linux uses PKRU except for signal related XSTATE handling. Prepare to move PKRU away from being XSAVE-managed. Allocate space in the thread_struct for it and save/restore it in the context-switch path separately from the XSAVE-managed features. task->thread_struct.pkru is only valid when the task is scheduled out. For the current task the authoritative source is the hardware, i.e. it has to be retrieved via rdpkru(). Leave the XSAVE code in place for now to ensure bisectability. Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121456.399107624@linutronix.de
-
- 18 5月, 2021 1 次提交
-
-
由 Borislav Petkov 提交于
SEV-ES guests require properly setup task register with which the TSS descriptor in the GDT can be located so that the IST-type #VC exception handler which they need to function properly, can be executed. This setup needs to happen before attempting to load microcode in ucode_cpu_init() on secondary CPUs which can cause such #VC exceptions. Simplify the machinery by running that exception setup from a new function cpu_init_secondary() and explicitly call cpu_init_exception_handling() for the boot CPU before cpu_init(). The latter prepares for fixing and simplifying the exception/IST setup on the boot CPU. There should be no functional changes resulting from this patch. [ tglx: Reworked it so cpu_init_exception_handling() stays seperate ] Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NLai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/87k0o6gtvu.ffs@nanos.tec.linutronix.de
-
- 13 5月, 2021 1 次提交
-
-
由 Huang Rui 提交于
Some AMD Ryzen generations has different calculation method on maximum performance. 255 is not for all ASICs, some specific generations should use 166 as the maximum performance. Otherwise, it will report incorrect frequency value like below: ~ → lscpu | grep MHz CPU MHz: 3400.000 CPU max MHz: 7228.3198 CPU min MHz: 2200.0000 [ mingo: Tidied up whitespace use. ] [ Alexander Monakov <amonakov@ispras.ru>: fix 225 -> 255 typo. ] Fixes: 41ea6672 ("x86, sched: Calculate frequency invariance for AMD systems") Fixes: 3c55e94c ("cpufreq: ACPI: Extend frequency tables to cover boost frequencies") Reported-by: NJason Bagavatsingham <jason.bagavatsingham@gmail.com> Fixed-by: NAlexander Monakov <amonakov@ispras.ru> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NHuang Rui <ray.huang@amd.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Tested-by: NJason Bagavatsingham <jason.bagavatsingham@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210425073451.2557394-1-ray.huang@amd.com Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=211791Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 3月, 2021 1 次提交
-
-
由 Lai Jiangshan 提交于
cpu_current_top_of_stack is currently stored in TSS.sp1. TSS is exposed through the cpu_entry_area which is visible with user CR3 when PTI is enabled and active. This makes it a coveted fruit for attackers. An attacker can fetch the kernel stack top from it and continue next steps of actions based on the kernel stack. But it is actualy not necessary to be stored in the TSS. It is only accessed after the entry code switched to kernel CR3 and kernel GS_BASE which means it can be in any regular percpu variable. The reason why it is in TSS is historical (pre PTI) because TSS is also used as scratch space in SYSCALL_64 and therefore cache hot. A syscall also needs the per CPU variable current_task and eventually __preempt_count, so placing cpu_current_top_of_stack next to them makes it likely that they end up in the same cache line which should avoid performance regressions. This is not enforced as the compiler is free to place these variables, so these entry relevant variables should move into a data structure to make this enforceable. The seccomp_benchmark doesn't show any performance loss in the "getpid native" test result. Actually, the result changes from 93ns before to 92ns with this change when KPTI is disabled. The test is very stable and although the test doesn't show a higher degree of precision it gives enough confidence that moving cpu_current_top_of_stack does not cause a regression. [ tglx: Removed unneeded export. Massaged changelog ] Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20210125173444.22696-2-jiangshanlai@gmail.com
-
- 18 3月, 2021 1 次提交
-
-
由 Ingo Molnar 提交于
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org
-
- 17 3月, 2021 1 次提交
-
-
由 Oleg Nesterov 提交于
Move TS_COMPAT back to asm/thread_info.h, close to TS_I386_REGS_POKED. It was moved to asm/processor.h by b9d989c7 ("x86/asm: Move the thread_info::status field to thread_struct"), then later 37a8f7c3 ("x86/asm: Move 'status' from thread_struct to thread_info") moved the 'status' field back but TS_COMPAT was forgotten. Preparatory patch to fix the COMPAT case for get_nr_restart_syscall() Fixes: 609c19a3 ("x86/ptrace: Stop setting TS_COMPAT in ptrace code") Signed-off-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210201174649.GA17880@redhat.com
-
- 08 3月, 2021 1 次提交
-
-
由 Andy Lutomirski 提交于
On 32-bit kernels, the stackprotector canary is quite nasty -- it is stored at %gs:(20), which is nasty because 32-bit kernels use %fs for percpu storage. It's even nastier because it means that whether %gs contains userspace state or kernel state while running kernel code depends on whether stackprotector is enabled (this is CONFIG_X86_32_LAZY_GS), and this setting radically changes the way that segment selectors work. Supporting both variants is a maintenance and testing mess. Merely rearranging so that percpu and the stack canary share the same segment would be messy as the 32-bit percpu address layout isn't currently compatible with putting a variable at a fixed offset. Fortunately, GCC 8.1 added options that allow the stack canary to be accessed as %fs:__stack_chk_guard, effectively turning it into an ordinary percpu variable. This lets us get rid of all of the code to manage the stack canary GDT descriptor and the CONFIG_X86_32_LAZY_GS mess. (That name is special. We could use any symbol we want for the %fs-relative mode, but for CONFIG_SMP=n, gcc refuses to let us use any name other than __stack_chk_guard.) Forcibly disable stackprotector on older compilers that don't support the new options and turn the stack canary into a percpu variable. The "lazy GS" approach is now used for all 32-bit configurations. Also makes load_gs_index() work on 32-bit kernels. On 64-bit kernels, it loads the GS selector and updates the user GSBASE accordingly. (This is unchanged.) On 32-bit kernels, it loads the GS selector and updates GSBASE, which is now always the user base. This means that the overall effect is the same on 32-bit and 64-bit, which avoids some ifdeffery. [ bp: Massage commit message. ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/c0ff7dba14041c7e5d1cae5d4df052f03759bef3.1613243844.git.luto@kernel.org
-
- 11 2月, 2021 2 次提交
-
-
由 Thomas Gleixner 提交于
The per CPU hardirq_stack_ptr contains the pointer to the irq stack in the form that it is ready to be assigned to [ER]SP so that the first push ends up on the top entry of the stack. But the stack switching on 64 bit has the following rules: 1) Store the current stack pointer (RSP) in the top most stack entry to allow the unwinder to link back to the previous stack 2) Set RSP to the top most stack entry 3) Invoke functions on the irq stack 4) Pop RSP from the top most stack entry (stored in #1) so it's back to the original stack. That requires all stack switching code to decrement the stored pointer by 8 in order to be able to store the current RSP and then set RSP to that location. That's a pointless exercise. Do the -8 adjustment right when storing the pointer and make the data type a void pointer to avoid confusion vs. the struct irq_stack data type which is on 64bit only used to declare the backing store. Move the definition next to the inuse flag so they likely end up in the same cache line. Sticking them into a struct to enforce it is a seperate change. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002512.354260928@linutronix.de
-
由 Thomas Gleixner 提交于
The recursion protection for hard interrupt stacks is an unsigned int per CPU variable initialized to -1 named __irq_count. The irq stack switching is only done when the variable is -1, which creates worse code than just checking for 0. When the stack switching happens it uses this_cpu_add/sub(1), but there is no reason to do so. It simply can use straight writes. This is a historical leftover from the low level ASM code which used inc and jz to make a decision. Rename it to hardirq_stack_inuse, make it a bool and use plain stores. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20210210002512.228830141@linutronix.de
-
- 19 11月, 2020 1 次提交
-
-
由 Yazen Ghannam 提交于
The Last Level Cache ID is returned by amd_get_nb_id(). In practice, this value is the same as the AMD NodeId for callers of this function. The NodeId is saved in struct cpuinfo_x86.cpu_die_id. Replace calls to amd_get_nb_id() with the logical CPU's cpu_die_id and remove the function. Signed-off-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20201109210659.754018-3-Yazen.Ghannam@amd.com
-
- 09 9月, 2020 3 次提交
-
-
由 Joerg Roedel 提交于
The IDT on 64-bit contains vectors which use paranoid_entry() and/or IST stacks. To make these vectors work, the TSS and the getcpu GDT entry need to be set up before the IDT is loaded. Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-68-joro@8bytes.org
-
由 Christoph Hellwig 提交于
Stop providing the possibility to override the address space using set_fs() now that there is no need for that any more. To properly handle the TASK_SIZE_MAX checking for 4 vs 5-level page tables on x86 a new alternative is introduced, which just like the one in entry_64.S has to use the hardcoded virtual address bits to escape the fact that TASK_SIZE_MAX isn't actually a constant when 5-level page tables are enabled. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
At least for 64-bit this moves them closer to some of the defines they are based on, and it prepares for using the TASK_SIZE_MAX definition from assembly. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 04 9月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
Current usage of thread.debugreg6 is convoluted at best. It starts life as a copy of the hardware DR6 value, but then various bits are cleared and set. Replace this with a new variable thread.virtual_dr6 that is initialized to 0 when DR6 is read and only gains bits, at the same time the actual (on stack) dr6 value which is read from the hardware only gets bits cleared. Suggested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NDaniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200902133201.415372940@infradead.org
-
- 27 7月, 2020 1 次提交
-
-
由 Ricardo Neri 提交于
Having sync_core() in processor.h is problematic since it is not possible to check for hardware capabilities via the *cpu_has() family of macros. The latter needs the definitions in processor.h. It also looks more intuitive to relocate the function to sync_core.h. This changeset does not make changes in functionality. Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20200727043132.15082-3-ricardo.neri-calderon@linux.intel.com
-
- 25 6月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
Marco crashed in bad_iret with a Clang11/KCSAN build due to overflowing the stack. Now that we run C code on it, expand it to a full page. Suggested-by: NAndy Lutomirski <luto@amacapital.net> Reported-by: NMarco Elver <elver@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200618144801.819246178@infradead.org
-
- 18 6月, 2020 2 次提交
-
-
由 Thomas Gleixner 提交于
save_fsgs_for_kvm() is invoked via vcpu_enter_guest() kvm_x86_ops.prepare_guest_switch(vcpu) vmx_prepare_switch_to_guest() save_fsgs_for_kvm() with preemption disabled, but interrupts enabled. The upcoming FSGSBASE based GS safe needs interrupts to be disabled. This could be done in the helper function, but that function is also called from switch_to() which has interrupts disabled already. Disable interrupts inside save_fsgs_for_kvm() and rename the function to current_save_fsgs() so it can be invoked from other places. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200528201402.1708239-7-sashal@kernel.org
-
由 Chang S. Bae 提交于
Add cpu feature conditional FSGSBASE access to the relevant helper functions. That allows to accelerate certain FS/GS base operations in subsequent changes. Note, that while possible, the user space entry/exit GSBASE operations are not going to use the new FSGSBASE instructions. The reason is that it would require additional storage for the user space value which adds more complexity to the low level code and experiments have shown marginal benefit. This may be revisited later but for now the SWAPGS based handling in the entry code is preserved except for the paranoid entry/exit code. To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive() helpers. Note, for Xen PV, paravirt hooks can be added later as they might allow a very efficient but different implementation. [ tglx: Massaged changelog, convert it to noinstr and force inline native_swapgs() ] Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-5-sashal@kernel.org
-
- 11 6月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
vmlinux.o: warning: objtool: exc_page_fault()+0x9: call to read_cr2() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_page_fault()+0x24: call to prefetchw() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_page_fault()+0x21: call to kvm_handle_async_pf.isra.0() leaves .noinstr.text section vmlinux.o: warning: objtool: exc_nmi()+0x1cc: call to write_cr2() leaves .noinstr.text section Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200603114052.243227806@infradead.org
-
- 07 5月, 2020 1 次提交
-
-
由 Reinette Chatre 提交于
The original Memory Bandwidth Monitoring (MBM) architectural definition defines counters of up to 62 bits in the IA32_QM_CTR MSR while the first-generation MBM implementation uses statically defined 24 bit counters. Expand the MBM CPUID enumeration properties to include the MBM counter width. The previously undefined EAX output register contains, in bits [7:0], the MBM counter width encoded as an offset from 24 bits. Enumerating this property is only specified for Intel CPUs. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NReinette Chatre <reinette.chatre@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/afa3af2f753f6bc301fb743bc8944e749cb24afa.1588715690.git.reinette.chatre@intel.com
-
- 22 4月, 2020 1 次提交
-
-
由 Peter Zijlstra 提交于
Teach objtool a little more about IRET so that we can avoid using the SAVE/RESTORE annotation. In particular, make the weird corner case in insn->restore go away. The purpose of that corner case is to deal with the fact that UNWIND_HINT_RESTORE lands on the instruction after IRET, but that instruction can end up being outside the basic block, consider: if (cond) sync_core() foo(); Then the hint will land on foo(), and we'll encounter the restore hint without ever having seen the save hint. By teaching objtool about the arch specific exception frame size, and assuming that any IRET in an STT_FUNC symbol is an exception frame sized POP, we can remove the use of save/restore hints for this code. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMiroslav Benes <mbenes@suse.cz> Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Link: https://lkml.kernel.org/r/20200416115118.631224674@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2020 1 次提交
-
-
由 Al Viro 提交于
finally Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 3月, 2020 1 次提交
-
-
由 Vincenzo Frascino 提交于
Enable x86 to use only the common headers in the implementation of the vDSO library. Signed-off-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20200320145351.32292-24-vincenzo.frascino@arm.com
-
- 24 1月, 2020 1 次提交
-
-
由 Dave Hansen 提交于
From: Dave Hansen <dave.hansen@linux.intel.com> MPX is being removed from the kernel due to a lack of support in the toolchain going forward (gcc). This removes all the remaining (dead at this point) MPX handling code remaining in the tree. The only remaining code is the XSAVE support for MPX state which is currently needd for KVM to handle VMs which might use MPX. Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: x86@kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
-
- 14 1月, 2020 2 次提交
-
-
由 Sean Christopherson 提交于
Add an entry in struct cpuinfo_x86 to track VMX capabilities and fill the capabilities during IA32_FEAT_CTL MSR initialization. Make the VMX capabilities dependent on IA32_FEAT_CTL and X86_FEATURE_NAMES so as to avoid unnecessary overhead on CPUs that can't possibly support VMX, or when /proc/cpuinfo is not available. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-11-sean.j.christopherson@intel.com
-
由 Sean Christopherson 提交于
Add a VMX-specific variant of X86_FEATURE_* flags, which will eventually supplant the synthetic VMX flags defined in cpufeatures word 8. Use the Intel-defined layouts for the major VMX execution controls so that their word entries can be directly populated from their respective MSRs, and so that the VMX_FEATURE_* flags can be used to define the existing bit definitions in asm/vmx.h, i.e. force developers to define a VMX_FEATURE flag when adding support for a new hardware feature. The majority of Intel's (and compatible CPU's) VMX capabilities are enumerated via MSRs and not CPUID, i.e. querying /proc/cpuinfo doesn't naturally provide any insight into the virtualization capabilities of VMX enabled CPUs. Commit e38e05a8 ("x86: extended "flags" to show virtualization HW feature in /proc/cpuinfo") attempted to address the issue by synthesizing select VMX features into a Linux-defined word in cpufeatures. Lack of reporting of VMX capabilities via /proc/cpuinfo is problematic because there is no sane way for a user to query the capabilities of their platform, e.g. when trying to find a platform to test a feature or debug an issue that has a hardware dependency. Lack of reporting is especially problematic when the user isn't familiar with VMX, e.g. the format of the MSRs is non-standard, existence of some MSRs is reported by bits in other MSRs, several "features" from KVM's point of view are enumerated as 3+ distinct features by hardware, etc... The synthetic cpufeatures approach has several flaws: - The set of synthesized VMX flags has become extremely stale with respect to the full set of VMX features, e.g. only one new flag (EPT A/D) has been added in the the decade since the introduction of the synthetic VMX features. Failure to keep the VMX flags up to date is likely due to the lack of a mechanism that forces developers to consider whether or not a new feature is worth reporting. - The synthetic flags may incorrectly be misinterpreted as affecting kernel behavior, i.e. KVM, the kernel's sole consumer of VMX, completely ignores the synthetic flags. - New CPU vendors that support VMX have duplicated the hideous code that propagates VMX features from MSRs to cpufeatures. Bringing the synthetic VMX flags up to date would exacerbate the copy+paste trainwreck. Define separate VMX_FEATURE flags to set the stage for enumerating VMX capabilities outside of the cpu_has() framework, and for adding functional usage of VMX_FEATURE_* to help ensure the features reported via /proc/cpuinfo is up to date with respect to kernel recognition of VMX capabilities. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20191221044513.21680-10-sean.j.christopherson@intel.com
-
- 14 12月, 2019 1 次提交
-
-
由 Borislav Petkov 提交于
... because it is used only there. No functional changes. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: x86@kernel.org Link: https://lkml.kernel.org/r/20191112221823.19677-1-bp@alien8.de
-
- 27 11月, 2019 3 次提交
-
-
由 Andy Lutomirski 提交于
There are three problems with the current layout of the doublefault stack and TSS. First, the TSS is only cacheline-aligned, which is not enough -- if the hardware portion of the TSS (struct x86_hw_tss) crosses a page boundary, horrible things happen [0]. Second, the stack and TSS are global, so simultaneous double faults on different CPUs will cause massive corruption. Third, the whole mechanism won't work if user CR3 is loaded, resulting in a triple fault [1]. Let the doublefault stack and TSS share a page (which prevents the TSS from spanning a page boundary), make it percpu, and move it into cpu_entry_area. Teach the stack dump code about the doublefault stack. [0] Real hardware will read past the end of the page onto the next *physical* page if a task switch happens. Virtual machines may have any number of bugs, and I would consider it reasonable for a VM to summarily kill the guest if it tries to task-switch to a page-spanning TSS. [1] Real hardware triple faults. At least some VMs seem to hang. I'm not sure what's going on. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The 64-bit doublefault handler is much nicer than the 32-bit one. As a first step toward unifying them, make the 64-bit handler self-contained. This should have no effect no functional effect except in the odd case of x86_64 with CONFIG_DOUBLEFAULT=n in which case it will change the logging a bit. This also gets rid of CONFIG_DOUBLEFAULT configurability on 64-bit kernels. It didn't do anything useful -- CONFIG_DOUBLEFAULT=n didn't actually disable doublefault handling on x86_64. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Ingo Molnar 提交于
After the following commit: 05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") 'struct cpu_entry_area' has to be Kconfig invariant, so that we always have a matching CPU_ENTRY_AREA_PAGES size. This commit added a CONFIG_X86_IOPL_IOPERM dependency to tss_struct: 111e7b15: ("x86/ioperm: Extend IOPL config to control ioperm() as well") Which, if CONFIG_X86_IOPL_IOPERM is turned off, reduces the size of cpu_entry_area by two pages, triggering the assert: ./include/linux/compiler.h:391:38: error: call to ‘__compiletime_assert_202’ declared with attribute error: BUILD_BUG_ON failed: (CPU_ENTRY_AREA_PAGES+1)*PAGE_SIZE != CPU_ENTRY_AREA_MAP_SIZE Simplify the Kconfig dependencies and make cpu_entry_area constant size on 32-bit kernels again. Fixes: 05b042a1: ("x86/pti/32: Calculate the various PTI cpu_entry_area sizes correctly, make the CPU_ENTRY_AREA_PAGES assert precise") Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 11月, 2019 8 次提交
-
-
由 Thomas Gleixner 提交于
If iopl() is disabled, then providing ioperm() does not make much sense. Rename the config option and disable/enable both syscalls with it. Guard the code with #ifdefs where appropriate. Suggested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
The IOPL emulation via the I/O bitmap is sufficient. Remove the legacy cruft dealing with the (e)flags based IOPL mechanism. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> (Paravirt and Xen parts) Acked-by: NAndy Lutomirski <luto@kernel.org>
-
由 Thomas Gleixner 提交于
The access to the full I/O port range can be also provided by the TSS I/O bitmap, but that would require to copy 8k of data on scheduling in the task. As shown with the sched out optimization TSS.io_bitmap_base can be used to switch the incoming task to a preallocated I/O bitmap which has all bits zero, i.e. allows access to all I/O ports. Implementing this allows to provide an iopl() emulation mode which restricts the IOPL level 3 permissions to I/O port access but removes the STI/CLI permission which is coming with the hardware IOPL mechansim. Provide a config option to switch IOPL to emulation mode, make it the default and while at it also provide an option to disable IOPL completely. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NAndy Lutomirski <luto@kernel.org>
-
由 Thomas Gleixner 提交于
Add a globally unique sequence number which is incremented when ioperm() is changing the I/O bitmap of a task. Store the new sequence number in the io_bitmap structure and compare it with the sequence number of the I/O bitmap which was last loaded on a CPU. Only update the bitmap if the sequence is different. That should further reduce the overhead of I/O bitmap scheduling when there are only a few I/O bitmap users on the system. The 64bit sequence counter is sufficient. A wraparound of the sequence counter assuming an ioperm() call every nanosecond would require about 584 years of uptime. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
No point in having all the data in thread_struct, especially as upcoming changes add more. Make the bitmap in the new struct accessible as array of longs and as array of characters via a union, so both the bitmap functions and the update logic can avoid type casts. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Move the non hardware portion of I/O bitmap data into a seperate struct for readability sake. Originally-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
There is no requirement to update the TSS I/O bitmap when a thread using it is scheduled out and the incoming thread does not use it. For the permission check based on the TSS I/O bitmap the CPU calculates the memory location of the I/O bitmap by the address of the TSS and the io_bitmap_base member of the tss_struct. The easiest way to invalidate the I/O bitmap is to switch the offset to an address outside of the TSS limit. If an I/O instruction is issued from user space the TSS limit causes #GP to be raised in the same was as valid I/O bitmap with all bits set to 1 would do. This removes the extra work when an I/O bitmap using task is scheduled out and puts the burden on the rare I/O bitmap users when they are scheduled in. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Fenghua Yu 提交于
The x86_capability array in cpuinfo_x86 is of type u32 and thus is naturally aligned to 4 bytes. But, set_bit() and clear_bit() require the array to be aligned to size of unsigned long (i.e. 8 bytes on 64-bit systems). The array pointer is handed into atomic bit operations. If the access is not aligned to unsigned long then the atomic bit operations can end up crossing a cache line boundary, which causes the CPU to do a full bus lock as it can't lock both cache lines at once. The bus lock operation is heavy weight and can cause severe performance degradation. The upcoming #AC split lock detection mechanism will issue warnings for this kind of access. Force the alignment of the array to unsigned long. This avoids the massive code changes which would be required when converting the array data type to unsigned long. [ tglx: Rewrote changelog so it contains information WHY this is required ] Suggested-by: NDavid Laight <David.Laight@aculab.com> Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190916223958.27048-4-tony.luck@intel.com
-
- 05 11月, 2019 1 次提交
-
-
由 Kees Cook 提交于
The memory freeing report wasn't very useful for figuring out which parts of the kernel image were being freed. Add the details for clearer reporting in dmesg. Before: Freeing unused kernel image memory: 1348K Write protecting the kernel read-only data: 20480k Freeing unused kernel image memory: 2040K Freeing unused kernel image memory: 172K After: Freeing unused kernel image (initmem) memory: 1348K Write protecting the kernel read-only data: 20480k Freeing unused kernel image (text/rodata gap) memory: 2040K Freeing unused kernel image (rodata/data gap) memory: 172K Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-alpha@vger.kernel.org Cc: linux-arch@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-c6x-dev@linux-c6x.org Cc: linux-ia64@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Segher Boessenkool <segher@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will@kernel.org> Cc: x86-ml <x86@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: https://lkml.kernel.org/r/20191029211351.13243-28-keescook@chromium.org
-