- 23 3月, 2016 1 次提交
-
-
由 MaJun 提交于
As a interrupt controller used on some of hisilicon SOCs(660,1610 etc.), mbigen driver should be enabled when CONFIG_ARCH_HISI is enabled. Signed-off-by: NMa Jun <majun258@huawei.com> Cc: mark.rutland@arm.com Cc: jason@lakedaemon.net Cc: marc.zyngier@arm.com Cc: Catalin.Marinas@arm.com Cc: guohanjun@huawei.com Cc: Will.Deacon@arm.com Cc: huxinwei@huawei.com Cc: lizefan@huawei.com Cc: dingtianhong@huawei.com Cc: zhaojunhua@hisilicon.com Cc: liguozhu@hisilicon.com Cc: linux-arm-kernel@lists.infradead.org Link: http://lkml.kernel.org/r/1458723993-21044-2-git-send-email-majun258@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 13 3月, 2016 4 次提交
-
-
由 James Hogan 提交于
When calculate_cpu_foreign_map() recalculates the cpu_foreign_map cpumask it uses the local variable temp_foreign_map without initialising it to zero. Since the calculation only ever sets bits in this cpumask any existing bits at that memory location will remain set and find their way into cpu_foreign_map too. This could potentially lead to cache operations suboptimally doing smp calls to multiple VPEs in the same core, even though the VPEs share primary caches. Therefore initialise temp_foreign_map using cpumask_clear() before use. Fixes: cccf34e9 ("MIPS: c-r4k: Fix cache flushing for MT cores") Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12759/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Hauke Mehrtens 提交于
The MIPS_GIC_IPI should only be selected when MIPS_GIC is also selected, otherwise it results in a compile error. smp-gic.c uses some functions from include/linux/irqchip/mips-gic.h like plat_ipi_call_int_xlate() which are only added to the header file when MIPS_GIC is set. The Lantiq SoC does not use the GIC, but supports SMP. The calls top the functions from smp-gic.c are already protected by some #ifdefs The first part of this was introduced in commit 72e20142 ("MIPS: Move GIC IPI functions out of smp-cmp.c") Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de> Cc: Paul Burton <paul.burton@imgtec.com> Cc: stable@vger.kernel.org # v3.15+ Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12774/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Aaro Koskinen 提交于
Ingenic SoC declares ZBOOT support, but debug definitions are missing for MACH_JZ4780 resulting in a build failure when DEBUG_ZBOOT is set. The UART addresses are same as with JZ4740, so fix by covering JZ4780 with those as well. Signed-off-by: NAaro Koskinen <aaro.koskinen@iki.fi> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12830/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Fenghua Yu 提交于
A few new AVX-512 instruction groups/features are added in cpufeatures.h for enuermation: AVX512DQ, AVX512BW, and AVX512VL. Clear the flags in fpu__xstate_clear_all_cpu_caps(). The specification for latest AVX-512 including the features can be found at: https://software.intel.com/sites/default/files/managed/07/b7/319433-023.pdf Note, I didn't enable the flags in KVM. Hopefully the KVM guys can pick up the flags and enable them in KVM. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1457667498-37357-1-git-send-email-fenghua.yu@intel.com [ Added more detailed feature descriptions. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 3月, 2016 3 次提交
-
-
由 Matt Fleming 提交于
Some machines have EFI regions in page zero (physical address 0x00000000) and historically that region has been added to the e820 map via trim_bios_range(), and ultimately mapped into the kernel page tables. It was not mapped via efi_map_regions() as one would expect. Alexis reports that with the new separate EFI page tables some boot services regions, such as page zero, are not mapped. This triggers an oops during the SetVirtualAddressMap() runtime call. For the EFI boot services quirk on x86 we need to memblock_reserve() boot services regions until after SetVirtualAddressMap(). Doing that while respecting the ownership of regions that may have already been reserved by the kernel was the motivation behind this commit: 7d68dc3f ("x86, efi: Do not reserve boot services regions within reserved areas") That patch was merged at a time when the EFI runtime virtual mappings were inserted into the kernel page tables as described above, and the trick of setting ->numpages (and hence the region size) to zero to track regions that should not be freed in efi_free_boot_services() meant that we never mapped those regions in efi_map_regions(). Instead we were relying solely on the existing kernel mappings. Now that we have separate page tables we need to make sure the EFI boot services regions are mapped correctly, even if someone else has already called memblock_reserve(). Instead of stashing a tag in ->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it generally makes no sense to mark a boot services region as required at runtime, it's pretty much guaranteed the firmware will not have already set this bit. For the record, the specific circumstances under which Alexis triggered this bug was that an EFI runtime driver on his machine was responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during SetVirtualAddressMap(). The event handler for this driver looks like this, sub rsp,0x28 lea rdx,[rip+0x2445] # 0xaa948720 mov ecx,0x4 call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720) mov r11,QWORD PTR [rip+0x2434] # 0xaa948720 xor eax,eax mov BYTE PTR [r11+0x1],0x1 add rsp,0x28 ret Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting instruction because ConvertPointer() was being called to convert the address 0x0000000000000000, which when converted is left unchanged and remains 0x0000000000000000. The output of the oops trace gave the impression of a standard NULL pointer dereference bug, but because we're accessing physical addresses during ConvertPointer(), it wasn't. EFI boot services code is stored at that address on Alexis' machine. Reported-by: NAlexis Murzeau <amurzeau@gmail.com> Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raphael Hertzog <hertzog@debian.org> Cc: Roger Shimizu <rogershimizu@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
i486 derived cores like Intel Quark support only the very old, legacy x87 FPU (FSAVE/FRSTOR, CPUID bit FXSR is not set), and our FPU code wasn't handling the saving and restoring there properly in the 'eagerfpu' case. So after we made eagerfpu the default for all CPU types: 58122bf1 x86/fpu: Default eagerfpu=on on all CPUs these old FPU designs broke. First, Andy Shevchenko reported a splat: WARNING: CPU: 0 PID: 823 at arch/x86/include/asm/fpu/internal.h:163 fpu__clear+0x8c/0x160 which was us trying to execute FXRSTOR on those machines even though they don't support it. After taking care of that, Bryan O'Donoghue reported that a simple FPU test still failed because we weren't initializing the FPU state properly on those machines. Take care of all that. Reported-and-tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie> Reported-by: NAndy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yu-cheng <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/20160311113206.GD4312@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Petazzoni 提交于
When the Crypto SRAM mappings were added to the Device Tree files describing the Armada XP boards in commit c466d997 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards"), the fact that those mappings were overlaping with the PCIe memory aperture was overlooked. Due to this, we currently have for all Armada XP platforms a situation that looks like this: Memory mapping on Armada XP boards with internal registers at 0xf1000000: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf8000000 -> 0xffe0000 126M PCIe memory aperture - 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE ! - 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE ! - 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture - 0xfff0000 -> 0xffffffff 1M BootROM The overlap means that when PCIe devices are added, depending on their memory window needs, they might or might not be mapped into the physical address space. Indeed, they will not be mapped if the area allocated in the PCIe memory aperture by the PCI core overlaps with one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB of PCIe memory will see its PCIe memory window allocated from 0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due to this, the PCIe window is not created, and any attempt to access the PCIe window makes the kernel explode: [ 3.302213] igb: Copyright (c) 2007-2014 Intel Corporation. [ 3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143) [ 3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window [ 3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22 [ 3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018 This problem does not occur on Armada 370 boards, because we use the following memory mapping (for boards that have internal registers at 0xf1000000): - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK ! - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Obviously, the solution is to align the location of the Crypto SRAM mappings of Armada XP to be similar with the ones on Armada 370, i.e have them between the "internal registers" area and the beginning of the PCIe aperture. However, we have a special case with the OpenBlocks AX3-4 platform, which has a 128 MB NOR flash. Currently, this NOR flash is mapped from 0xf0000000 to 0xf8000000. This is possible because on OpenBlocks AX3-4, the internal registers are not at 0xf1000000. And this explains why the Crypto SRAM mappings were not configured at the same place on Armada XP. Hence, the solution is two-fold: (1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from 0xe8000000 to 0xf0000000. This frees the 0xf0000000 -> 0xf80000000 space. (2) Move the Crypto SRAM mappings on Armada XP to be similar to Armada 370 (except of course that Armada XP has two Crypto SRAM and not one). After this patch, the memory mapping on Armada XP boards with registers at 0xf1 is: - 0x00000000 -> 0xf0000000 3.75G RAM - 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB) - 0xf1000000 -> 0xf1100000 1M internal registers - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM And the memory mapping for the special case of the OpenBlocks AX3-4 (internal registers at 0xd0000000, NOR of 128 MB): - 0x00000000 -> 0xc0000000 3G RAM - 0xd0000000 -> 0xd1000000 1M internal registers - 0xe800000 -> 0xf0000000 128M NOR flash - 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 - 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1 - 0xf8000000 -> 0xffe0000 126M PCIe memory - 0xffe00000 -> 0xfff00000 1M PCIe I/O - 0xfff0000 -> 0xffffffff 1M BootROM Fixes: c466d997 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards") Reported-by: NPhil Sutter <phil@nwl.cc> Cc: Phil Sutter <phil@nwl.cc> Cc: <stable@vger.kernel.org> Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com> Acked-by: NGregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: NOlof Johansson <olof@lixom.net>
-
- 11 3月, 2016 2 次提交
-
-
由 Hector Marco-Gisbert 提交于
Currently on i386 and on X86_64 when emulating X86_32 in legacy mode, only the stack and the executable are randomized but not other mmapped files (libraries, vDSO, etc.). This patch enables randomization for the libraries, vDSO and mmap requests on i386 and in X86_32 in legacy mode. By default on i386 there are 8 bits for the randomization of the libraries, vDSO and mmaps which only uses 1MB of VA. This patch preserves the original randomness, using 1MB of VA out of 3GB or 4GB. We think that 1MB out of 3GB is not a big cost for having the ASLR. The first obvious security benefit is that all objects are randomized (not only the stack and the executable) in legacy mode which highly increases the ASLR effectiveness, otherwise the attackers may use these non-randomized areas. But also sensitive setuid/setgid applications are more secure because currently, attackers can disable the randomization of these applications by setting the ulimit stack to "unlimited". This is a very old and widely known trick to disable the ASLR in i386 which has been allowed for too long. Another trick used to disable the ASLR was to set the ADDR_NO_RANDOMIZE personality flag, but fortunately this doesn't work on setuid/setgid applications because there is security checks which clear Security-relevant flags. This patch always randomizes the mmap_legacy_base address, removing the possibility to disable the ASLR by setting the stack to "unlimited". Signed-off-by: NHector Marco-Gisbert <hecmargi@upv.es> Acked-by: NIsmael Ripoll Ripoll <iripoll@upv.es> Acked-by: NKees Cook <keescook@chromium.org> Acked-by: NArjan van de Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Cc: kees Cook <keescook@chromium.org> Link: http://lkml.kernel.org/r/1457639460-5242-1-git-send-email-hecmargi@upv.esSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jianyu Zhan 提交于
Commit abd4f750 ("x86: i386-show-unhandled-signals-v3") did turn on the showing-unhandled-signal behaviour for i386 for some exception handlers, but for no reason do_trap() is left out (my naive guess is because turning it on for do_trap() would be too noisy since do_trap() is shared by several exceptions). And since the same commit make "show_unhandled_signals" a debug tunable(in /proc/sys/debug/exception-trace), and x86 by default turning it on. So it would be strange for i386 users who turing it on manually and expect seeing the unhandled signal output in log, but nothing. This patch turns it on for i386 in do_trap() as well. Signed-off-by: NJianyu Zhan <nasa4836@gmail.com> Reviewed-by: NJan Beulich <jbeulich@suse.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: dave.hansen@linux.intel.com Cc: heukelum@fastmail.fm Cc: jbeulich@novell.com Cc: jdike@addtoit.com Cc: joe@perches.com Cc: luto@kernel.org Link: http://lkml.kernel.org/r/1457612398-4568-1-git-send-email-nasa4836@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 10 3月, 2016 19 次提交
-
-
由 Borislav Petkov 提交于
We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly accessed per-cpu var as the MONITORX target in delay_mwaitx(). However, when called in preemptible context, this_cpu_ptr -> smp_processor_id() -> debug_smp_processor_id() fires: BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312 caller is delay_mwaitx+0x40/0xa0 But we don't care about that check - we only need cpu_tss as a MONITORX target and it doesn't really matter which CPU's var we're touching as we're going idle anyway. Fix that. Suggested-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: spg_linux_kernel@amd.com Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Paolo Bonzini 提交于
KVM has special logic to handle pages with pte.u=1 and pte.w=0 when CR0.WP=1. These pages' SPTEs flip continuously between two states: U=1/W=0 (user and supervisor reads allowed, supervisor writes not allowed) and U=0/W=1 (supervisor reads and writes allowed, user writes not allowed). When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0, making the two states U=1/W=0/NX=gpte.NX and U=0/W=1/NX=1. When guest EFER has the NX bit cleared, the reserved bit check thinks that the latter state is invalid; teach it that the smep_andnot_wp case will also use the NX bit of SPTEs. Cc: stable@vger.kernel.org Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.inel.com> Fixes: c258b62bSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Yes, all of these are needed. :) This is admittedly a bit odd, but kvm-unit-tests access.flat tests this if you run it with "-cpu host" and of course ept=0. KVM runs the guest with CR0.WP=1, so it must handle supervisor writes specially when pte.u=1/pte.w=0/CR0.WP=0. Such writes cause a fault when U=1 and W=0 in the SPTE, but they must succeed because CR0.WP=0. When KVM gets the fault, it sets U=0 and W=1 in the shadow PTE and restarts execution. This will still cause a user write to fault, while supervisor writes will succeed. User reads will fault spuriously now, and KVM will then flip U and W again in the SPTE (U=1, W=0). User reads will be enabled and supervisor writes disabled, going back to the originary situation where supervisor writes fault spuriously. When SMEP is in effect, however, U=0 will enable kernel execution of this page. To avoid this, KVM also sets NX=1 in the shadow PTE together with U=0. If the guest has not enabled NX, the result is a continuous stream of page faults due to the NX bit being reserved. The fix is to force EFER.NX=1 even if the CPU is taking care of the EFER switch. (All machines with SMEP have the CPU_LOAD_IA32_EFER vm-entry control, so they do not use user-return notifiers for EFER---if they did, EFER.NX would be forced to the same value as the host). There is another bug in the reserved bit check, which I've split to a separate patch for easier application to stable kernels. Cc: stable@vger.kernel.org Cc: Andy Lutomirski <luto@amacapital.net> Reviewed-by: NXiao Guangrong <guangrong.xiao@linux.intel.com> Fixes: f6577a5fSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Lutomirski 提交于
Now that slow-path syscalls always enter C before enabling interrupts, it's straightforward to call enter_from_user_mode() before enabling interrupts rather than doing it as part of entry tracing. With this change, we should finally be able to retire exception_enter(). This will also enable optimizations based on knowing that we never change context tracking state with interrupts on. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/bc376ecf87921a495e874ff98139b1ca2f5c5dd7.1457558566.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
We want all of the syscall entries to run with interrupts off so that we can efficiently run context tracking before enabling interrupts. This will regress int $0x80 performance on 32-bit kernels by a couple of cycles. This shouldn't matter much -- int $0x80 is not a fast path. This effectively reverts: 657c1eea ("x86/entry/32: Fix entry_INT80_32() to expect interrupts to be on") ... and fixes the same issue differently. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/59b4f90c9ebfccd8c937305dbbbca680bc74b905.1457558566.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yu-cheng Yu 提交于
Leonid Shatz noticed that the SDM interpretation of the following recent commit: 394db20c ("x86/fpu: Disable AVX when eagerfpu is off") ... is incorrect and that the original behavior of the FPU code was correct. Because AVX is not stated in CR0 TS bit description, it was mistakenly believed to be not supported for lazy context switch. This turns out to be false: Intel Software Developer's Manual Vol. 3A, Sec. 2.5 Control Registers: 'TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task.' Intel Software Developer's Manual Vol. 2A, Sec. 2.4 Instruction Exception Specification: 'AVX instructions refer to exceptions by classes that include #NM "Device Not Available" exception for lazy context switch.' So revert the commit. Reported-by: NLeonid Shatz <leonid.shatz@ravellosystems.com> Signed-off-by: NYu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1457569734-3785-1-git-send-email-yu-cheng.yu@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Ingo suggested that the comments should explain when the various entries are used. This adds these explanations and improves other parts of the comments. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9524ecef7a295347294300045d08354d6a57c6e7.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Now that SYSENTER with TF set puts X86_EFLAGS_TF directly into regs->flags, we don't need a TIF_SINGLESTEP fixup in the syscall entry code. Remove it. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/2d15f24da52dafc9d2f0b8d76f55544f4779c517.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The first instruction of the SYSENTER entry runs on its own tiny stack. That stack can be used if a #DB or NMI is delivered before the SYSENTER prologue switches to a real stack. We have code in place to prevent us from overflowing the tiny stack. For added paranoia, add a canary to the stack and check it in do_debug() -- that way, if something goes wrong with the #DB logic, we'll eventually notice. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/6ff9a806f39098b166dc2c41c1db744df5272f29.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Right after SYSENTER, we can get a #DB or NMI. On x86_32, there's no IST, so the exception handler is invoked on the temporary SYSENTER stack. Because the SYSENTER stack is very small, we have a fixup to switch off the stack quickly when this happens. The old fixup had several issues: 1. It checked the interrupt frame's CS and EIP. This wasn't obviously correct on Xen or if vm86 mode was in use [1]. 2. In the NMI handler, it did some frightening digging into the stack frame. I'm not convinced this digging was correct. 3. The fixup didn't switch stacks and then switch back. Instead, it synthesized a brand new stack frame that would redirect the IRET back to the SYSENTER code. That frame was highly questionable. For one thing, if NMI nested inside #DB, we would effectively abort the #DB prologue, which was probably safe but was frightening. For another, the code used PUSHFL to write the FLAGS portion of the frame, which was simply bogus -- by the time PUSHFL was called, at least TF, NT, VM, and all of the arithmetic flags were clobbered. Simplify this considerably. Instead of looking at the saved frame to see where we came from, check the hardware ESP register against the SYSENTER stack directly. Malicious user code cannot spoof the kernel ESP register, and by moving the check after SAVE_ALL, we can use normal PER_CPU accesses to find all the relevant addresses. With this patch applied, the improved syscall_nt_32 test finally passes on 32-bit kernels. [1] It isn't obviously correct, but it is nonetheless safe from vm86 shenanigans as far as I can tell. A user can't point EIP at entry_SYSENTER_32 while in vm86 mode because entry_SYSENTER_32, like all kernel addresses, is greater than 0xffff and would thus violate the CS segment limit. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/b2cdbc037031c07ecf2c40a96069318aec0e7971.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The SYSENTER stack is only used on 32-bit kernels. Remove it on 64-bit kernels. ( We may end up using it down the road on 64-bit kernels. If so, we'll re-enable it for CONFIG_IA32_EMULATION. ) Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/9dbd18429f9ff61a76b6eda97a9ea20510b9f6ba.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Due to a blatant design error, SYSENTER doesn't clear TF (single-step). As a result, if a user does SYSENTER with TF set, we will single-step through the kernel until something clears TF. There is absolutely nothing we can do to prevent this short of turning off SYSENTER [1]. Simplify the handling considerably with two changes: 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC. We can add TF to that list of flags to sanitize with no overhead whatsoever. 2. Teach do_debug() to ignore single-step traps in the SYSENTER prologue. That's all we need to do. Don't get too excited -- our handling is still buggy on 32-bit kernels. There's nothing wrong with the SYSENTER code itself, but the #DB prologue has a clever fixup for traps on the very first instruction of entry_SYSENTER_32, and the fixup doesn't work quite correctly. The next two patches will fix that. [1] We could probably prevent it by forcing BTF on at all times and making sure we clear TF before any branches in the SYSENTER code. Needless to say, this is a bad idea. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/a30d2ea06fe4b621fe6a9ef911b02c0f38feb6f2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
Leaving any bits set in DR6 on return from a debug exception is asking for trouble. Prevent it by writing zero right away and clarify the comment. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3857676e1be8fb27db4b89bbb1e2052b7f435ff4.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
The SDM says that debug exceptions clear BTF, and we need to keep TIF_BLOCKSTEP in sync with BTF. Clear it unconditionally and improve the comment. I suspect that the fact that kmemcheck could cause TIF_BLOCKSTEP not to be cleared was just an oversight. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/fa86e55d196e6dde5b38839595bde2a292c52fdc.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
We weren't restoring FLAGS at all on SYSEXIT. Apparently no one cared. With this patch applied, native kernels should always honor task_pt_regs()->flags, which opens the door for some sys_iopl() cleanups. I'll do those as a separate series, though, since getting it right will involve tweaking some paravirt ops. ( The short version is that, before this patch, sys_iopl(), invoked via SYSENTER, wasn't guaranteed to ever transfer the updated regs->flags, so sys_iopl() had to change the hardware flags register as well. ) Reported-by: NBrian Gerst <brgerst@gmail.com> Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/3f98b207472dc9784838eb5ca2b89dcc845ce269.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
This makes the 32-bit code work just like the 64-bit code. It should speed up syscalls on 32-bit kernels on Skylake by something like 20 cycles (by analogy to the 64-bit compat case). It also cleans up NT just like we do for the 64-bit case. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/07daef3d44bd1ed62a2c866e143e8df64edb40ee.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
CLAC is slow, and the SYSENTER code already has an unlikely path that runs if unusual flags are set. Drop the CLAC and instead rely on the unlikely path to clear AC. This seems to save ~24 cycles on my Skylake laptop. (Hey, Intel, make this faster please!) Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/90d6db2189f9add83bc7bddd75a0c19ebbd676b2.1457578375.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Martin Schwidefsky 提交于
The fork of a process with four page table levels is broken since git commit 6252d702 "[S390] dynamic page tables." All new mm contexts are created with three page table levels and an asce limit of 4TB. If the parent has four levels dup_mmap will add vmas to the new context which are outside of the asce limit. The subsequent call to copy_page_range will walk the three level page table structure of the new process with non-zero pgd and pud indexes. This leads to memory clobbers as the pgd_index *and* the pud_index is added to the mm->pgd pointer without a pgd_deref in between. The init_new_context() function is selecting the number of page table levels for a new context. The function is used by mm_init() which in turn is called by dup_mm() and mm_alloc(). These two are used by fork() and exec(). The init_new_context() function can distinguish the two cases by looking at mm->context.asce_limit, for fork() the mm struct has been copied and the number of page table levels may not change. For exec() the mm_alloc() function set the new mm structure to zero, in this case a three-level page table is created as the temporary stack space is located at STACK_TOP_MAX = 4TB. This fixes CVE-2016-2143. Reported-by: NMarcin Kościelnicki <koriakin@0x04.net> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Mark Rutland 提交于
Functions which the compiler has instrumented for KASAN place poison on the stack shadow upon entry and remove this poison prior to returning. In the case of cpuidle, CPUs exit the kernel a number of levels deep in C code. Any instrumented functions on this critical path will leave portions of the stack shadow poisoned. If CPUs lose context and return to the kernel via a cold path, we restore a prior context saved in __cpu_suspend_enter are forgotten, and we never remove the poison they placed in the stack shadow area by functions calls between this and the actual exit of the kernel. Thus, (depending on stackframe layout) subsequent calls to instrumented functions may hit this stale poison, resulting in (spurious) KASAN splats to the console. To avoid this, clear any stale poison from the idle thread for a CPU prior to bringing a CPU online. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Alexander Potapenko <glider@google.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 09 3月, 2016 7 次提交
-
-
由 Will Deacon 提交于
Commit 66b3923a ("arm64: hugetlb: add support for PTE contiguous bit") introduced support for huge pages using the contiguous bit in the PTE as opposed to block mappings, which may be slightly unwieldy (512M) in 64k page configurations. Unfortunately, this support has resulted in some late regressions when running the libhugetlbfs test suite with 64k pages and CONFIG_DEBUG_VM as a result of a BUG: | readback (2M: 64): ------------[ cut here ]------------ | kernel BUG at fs/hugetlbfs/inode.c:446! | Internal error: Oops - BUG: 0 [#1] SMP | Modules linked in: | CPU: 7 PID: 1448 Comm: readback Not tainted 4.5.0-rc7 #148 | Hardware name: linux,dummy-virt (DT) | task: fffffe0040964b00 ti: fffffe00c2668000 task.ti: fffffe00c2668000 | PC is at remove_inode_hugepages+0x44c/0x480 | LR is at remove_inode_hugepages+0x264/0x480 Rather than revert the entire patch, simply avoid advertising the contiguous huge page sizes for now while people are actively working on a fix. This patch can then be reverted once things have been sorted out. Cc: David Woods <dwoods@ezchip.com> Reported-by: NSteve Capper <steve.capper@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Ard Biesheuvel 提交于
Commit dfd55ad8 ("arm64: vmemmap: use virtual projection of linear region") fixed an issue where the struct page array would overflow into the adjacent virtual memory region if system RAM was placed so high up in physical memory that its addresses were not representable in the build time configured virtual address size. However, the fix failed to take into account that the vmemmap region needs to be relatively aligned with respect to the sparsemem section size, so that a sequence of page structs corresponding with a sparsemem section in the linear region appears naturally aligned in the vmemmap region. So round up vmemmap to sparsemem section size. Since this essentially moves the projection of the linear region up in memory, also revert the reduction of the size of the vmemmap region. Cc: <stable@vger.kernel.org> Fixes: dfd55ad8 ("arm64: vmemmap: use virtual projection of linear region") Tested-by: NMark Langsdorf <mlangsdo@redhat.com> Tested-by: NDavid Daney <david.daney@cavium.com> Tested-by: NRobert Richter <rrichter@cavium.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
由 Luis R. Rodriguez 提交于
Rename dma_*_writecombine() to dma_*_wc(), so that the naming is coherent across the various write-combining APIs. Keep the old names for compatibility for a while, these can be removed at a later time. A guard is left to enable backporting of the rename, and later remove of the old mapping defines seemlessly. Build tested successfully with allmodconfig. The following Coccinelle SmPL patch was used for this simple transformation: @ rename_dma_alloc_writecombine @ expression dev, size, dma_addr, gfp; @@ -dma_alloc_writecombine(dev, size, dma_addr, gfp) +dma_alloc_wc(dev, size, dma_addr, gfp) @ rename_dma_free_writecombine @ expression dev, size, cpu_addr, dma_addr; @@ -dma_free_writecombine(dev, size, cpu_addr, dma_addr) +dma_free_wc(dev, size, cpu_addr, dma_addr) @ rename_dma_mmap_writecombine @ expression dev, vma, cpu_addr, dma_addr, size; @@ -dma_mmap_writecombine(dev, vma, cpu_addr, dma_addr, size) +dma_mmap_wc(dev, vma, cpu_addr, dma_addr, size) We also keep the old names as compatibility helpers, and guard against their definition to make backporting easier. Generated-by: Coccinelle SmPL Suggested-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: airlied@linux.ie Cc: akpm@linux-foundation.org Cc: benh@kernel.crashing.org Cc: bhelgaas@google.com Cc: bp@suse.de Cc: dan.j.williams@intel.com Cc: daniel.vetter@ffwll.ch Cc: dhowells@redhat.com Cc: julia.lawall@lip6.fr Cc: konrad.wilk@oracle.com Cc: linux-fbdev@vger.kernel.org Cc: linux-pci@vger.kernel.org Cc: luto@amacapital.net Cc: mst@redhat.com Cc: tomi.valkeinen@ti.com Cc: toshi.kani@hp.com Cc: vinod.koul@intel.com Cc: xen-devel@lists.xensource.com Link: http://lkml.kernel.org/r/1453516462-4844-1-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Sync it to the Kconfig default for 32-bit. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: tim.gardner@canonical.com Link: http://lkml.kernel.org/r/20160309134821.GD6564@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
After fixing FPU option parsing, we now parse the 'no387' boot option too early: no387 clears X86_FEATURE_FPU before it's even probed, so the boot CPU promptly re-enables it. I suspect it gets even more confused on SMP. Fix the probing code to leave X86_FEATURE_FPU off if it's been disabled by setup_clear_cpu_cap(). Signed-off-by: NAndy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Fixes: 4f81cbaf ("x86/fpu: Fix early FPU command-line parsing") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tony Luck 提交于
Make use of the EXTABLE_FAULT exception table entries to write a kernel copy routine that doesn't crash the system if it encounters a machine check. Prime use case for this is to copy from large arrays of non-volatile memory used as storage. We have to use an unrolled copy loop for now because current hardware implementations treat a machine check in "rep mov" as fatal. When that is fixed we can simplify. Return type is a "bool". True means that we copied OK, false means that it didn't. Signed-off-by: NTony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@gmail.com> Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Drop the quirk() function pointer in favor of a simple boolean which says whether the quirk should be applied or not. Update comment while at it. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <andi.kleen@intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jacob Pan <jacob.jun.pan@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-tip-commits@vger.kernel.org Link: http://lkml.kernel.org/r/20160308164041.GF16568@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 08 3月, 2016 4 次提交
-
-
由 Andy Lutomirski 提交于
It no longer has any users. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Lutomirski 提交于
x86_64 has very clean espfix handling on paravirt: espfix64 is set up in native_iret, so paravirt systems that override iret bypass espfix64 automatically. This is robust and straightforward. x86_32 is messier. espfix is set up before the IRET paravirt patch point, so it can't be directly conditionalized on whether we use native_iret. We also can't easily move it into native_iret without regressing performance due to a bizarre consideration. Specifically, on 64-bit kernels, the logic is: if (regs->ss & 0x4) setup_espfix; On 32-bit kernels, the logic is: if ((regs->ss & 0x4) && (regs->cs & 0x3) == 3 && (regs->flags & X86_EFLAGS_VM) == 0) setup_espfix; The performance of setup_espfix itself is essentially irrelevant, but the comparison happens on every IRET so its performance matters. On x86_64, there's no need for any registers except flags to implement the comparison, so we fold the whole thing into native_iret. On x86_32, we don't do that because we need a free register to implement the comparison efficiently. We therefore do espfix setup before restoring registers on x86_32. This patch gets rid of the explicit paravirt_enabled check by introducing X86_BUG_ESPFIX on 32-bit systems and using an ALTERNATIVE to skip espfix on paravirt systems where iret != native_iret. This is also messy, but it's at least in line with other things we do. This improves espfix performance by removing a branch, but no one cares. More importantly, it removes a paravirt_enabled user, which is good because paravirt_enabled is ill-defined and is going away. Signed-off-by: NAndy Lutomirski <luto@kernel.org> Reviewed-by: NBorislav Petkov <bp@suse.de> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: david.vrabel@citrix.com Cc: konrad.wilk@oracle.com Cc: lguest@lists.ozlabs.org Cc: xen-devel@lists.xensource.com Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kostenzer Felix 提交于
ignore_nmis is used in two distinct places: 1. modified through {stop,restart}_nmi by alternative_instructions 2. read by do_nmi to determine if default_do_nmi should be called or not thus the access pattern conforms to __read_mostly and do_nmi() is a fastpath. Signed-off-by: NKostenzer Felix <fkostenzer@live.at> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 David Hildenbrand 提交于
With MACHINE_HAS_VX, we convert the floating point registers from the vector registeres when storing the status. For other VCPUs, these are stored to vcpu->run->s.regs.vrs, but we are using current->thread.fpu.vxrs, which resolves to the currently loaded VCPU. So kvm_s390_store_status_unloaded() currently writes the wrong floating point registers (converted from the vector registers) when called from another VCPU on a z13. This is only the case for old user space not handling SIGP STORE STATUS and SIGP STOP AND STORE STATUS, but relying on the kernel implementation. All other calls come from the loaded VCPU via kvm_s390_store_status(). Fixes: 9abc2a08 (KVM: s390: fix memory overwrites when vx is disabled) Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Cc: stable@vger.kernel.org # v4.4+ Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-