- 08 7月, 2020 1 次提交
-
-
由 Kan Liang 提交于
CPUID.(EAX=07H, ECX=0):EDX[19] indicates whether an Intel CPU supports Architectural LBRs. The "X86_FEATURE_..., word 18" is already mirrored from CPUID "0x00000007:0 (EDX)". Add X86_FEATURE_ARCH_LBR under the "word 18" section. The feature will appear as "arch_lbr" in /proc/cpuinfo. The Architectural Last Branch Records (LBR) feature enables recording of software path history by logging taken branches and other control flows. The feature will be supported in the perf_events subsystem. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDave Hansen <dave.hansen@intel.com> Link: https://lkml.kernel.org/r/1593780569-62993-2-git-send-email-kan.liang@linux.intel.com
-
- 02 7月, 2020 5 次提交
-
-
由 Like Xu 提交于
When a guest wants to use the LBR registers, its hypervisor creates a guest LBR event and let host perf schedules it. The LBR records msrs are accessible to the guest when its guest LBR event is scheduled on by the perf subsystem. Before scheduling this event out, we should avoid host changes on IA32_DEBUGCTLMSR or LBR_SELECT. Otherwise, some unexpected branch operations may interfere with guest behavior, pollute LBR records, and even cause host branches leakage. In addition, the read operation on host is also avoidable. To ensure that guest LBR records are not lost during the context switch, the guest LBR event would enable the callstack mode which could save/restore guest unread LBR records with the help of intel_pmu_lbr_sched_task() naturally. However, the guest LBR_SELECT may changes for its own use and the host LBR event doesn't save/restore it. To ensure that we doesn't lost the guest LBR_SELECT value when the guest LBR event is running, the vlbr_constraint is bound up with a new constraint flag PERF_X86_EVENT_LBR_SELECT. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200514083054.62538-6-like.xu@linux.intel.com
-
由 Like Xu 提交于
The hypervisor may request the perf subsystem to schedule a time window to directly access the LBR records msrs for its own use. Normally, it would create a guest LBR event with callstack mode enabled, which is scheduled along with other ordinary LBR events on the host but in an exclusive way. To avoid wasting a counter for the guest LBR event, the perf tracks its hw->idx via INTEL_PMC_IDX_FIXED_VLBR and assigns it with a fake VLBR counter with the help of new vlbr_constraint. As with the BTS event, there is actually no hardware counter assigned for the guest LBR event. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200514083054.62538-5-like.xu@linux.intel.com
-
由 Like Xu 提交于
The LBR records msrs are model specific. The perf subsystem has already obtained the base addresses of LBR records based on the cpu model. Therefore, an interface is added to allow callers outside the perf subsystem to obtain these LBR information. It's useful for hypervisors to emulate the LBR feature for guests with less code. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200613080958.132489-4-like.xu@linux.intel.com
-
由 Like Xu 提交于
For intel_pmu_en/disable_event(), reorder the branches checks for hw->idx and make them sorted by probability: gp,fixed,bts,others. Clean up the x86_assign_hw_event() by converting multiple if-else statements to a switch statement. To skip x86_perf_event_update() and x86_perf_event_set_period(), it's generic to replace "idx == INTEL_PMC_IDX_FIXED_BTS" check with '!hwc->event_base' because that should be 0 for all non-gp/fixed cases. Wrap related bit operations into intel_set/clear_masks() and make the main path more cleaner and readable. No functional changes. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Original-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200613080958.132489-3-like.xu@linux.intel.com
-
由 Wei Wang 提交于
The MSR variable type can be 'unsigned int', which uses less memory than the longer 'unsigned long'. Fix 'struct x86_pmu' for that. The lbr_nr won't be a negative number, so make it 'unsigned int' as well. Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200613080958.132489-2-like.xu@linux.intel.com
-
- 28 6月, 2020 1 次提交
-
-
由 Patrice Chotard 提交于
This reverts commit 7b8e0188. Initially, STiH410-B2260 was supposed to be secured, that's why l2c_write_sec was stubbed to avoid secure register access from non secure world. But by default, STiH410-B2260 is running in non secure mode, so L2 cache register accesses are authorized, l2c_write_sec stub is not needed. With this patch, L2 cache is configured and performance are enhanced. Link: https://lore.kernel.org/r/20200618172456.29475-1-patrice.chotard@st.comSigned-off-by: NPatrice Chotard <patrice.chotard@st.com> Cc: Alain Volmat <alain.volmat@st.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de>
-
- 26 6月, 2020 7 次提交
-
-
由 Christoph Hellwig 提交于
Use PAGE_KERNEL_ROX directly instead of allocating RWX and setting the page read-only just after the allocation. Link: http://lkml.kernel.org/r/20200618064307.32739-3-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Patch series "fix a hyperv W^X violation and remove vmalloc_exec" Dexuan reported a W^X violation due to the fact that the hyper hypercall page due switching it to be allocated using vmalloc_exec. The problem is that PAGE_KERNEL_EXEC as used by vmalloc_exec actually sets writable permissions in the pte. This series fixes the issue by switching to the low-level __vmalloc_node_range interface that allows specifing more detailed permissions instead. It then also open codes the other two callers and removes the somewhat confusing vmalloc_exec interface. Peter noted that the hyper hypercall page allocation also has another long standing issue in that it shouldn't use the full vmalloc but just the module space. This issue is so far theoretical as the allocation is done early in the boot process. I plan to fix it with another bigger series for 5.9. This patch (of 3): Avoid a W^X violation cause by the fact that PAGE_KERNEL_EXEC includes the writable bit. For this resurrect the removed PAGE_KERNEL_RX definition, but as PAGE_KERNEL_ROX to match arm64 and powerpc. Link: http://lkml.kernel.org/r/20200618064307.32739-2-hch@lst.de Fixes: 78bb17f7 ("x86/hyperv: use vmalloc_exec for the hypercall page") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NDexuan Cui <decui@microsoft.com> Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com> Acked-by: NWei Liu <wei.liu@kernel.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Jessica Yu <jeyu@kernel.org> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stafford Horne 提交于
Since v5.8-rc1 OpenRISC Linux fails to boot when DEBUG_VM is enabled. This has been bisected to commit 42fc5414 ("mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()"). The added locking checks exposed the issue that OpenRISC was not taking this mmap lock when during page walks for DMA operations. This patch locks and unlocks the mmap lock for page walking. Link: http://lkml.kernel.org/r/20200617090247.1680188-1-shorne@gmail.com Fixes: 42fc5414 ("mmap locking API: add mmap_assert_locked() and mmap_assert_write_locked()" Signed-off-by: NStafford Horne <shorne@gmail.com> Reviewed-by: NMichel Lespinasse <walken@google.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Steven Price <steven.price@arm.com> Cc: Thomas Hellstrom <thellstrom@vmware.com> Cc: Robin Murphy <robin.murphy@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Guo Ren 提交于
For linux-5.8-rc1, enable ftrace of riscv will cause boot panic: [ 2.388980] Run /sbin/init as init process [ 2.529938] init[39]: unhandled signal 4 code 0x1 at 0x0000003ff449e000 [ 2.531078] CPU: 0 PID: 39 Comm: init Not tainted 5.8.0-rc1-dirty #13 [ 2.532719] epc: 0000003ff449e000 ra : 0000003ff449e954 sp : 0000003fffedb900 [ 2.534005] gp : 00000000000e8528 tp : 0000003ff449d800 t0 : 000000000000001e [ 2.534965] t1 : 000000000000000a t2 : 0000003fffedb89e s0 : 0000003fffedb920 [ 2.536279] s1 : 0000003fffedb940 a0 : 0000003ff43d4b2c a1 : 0000000000000000 [ 2.537334] a2 : 0000000000000001 a3 : 0000000000000000 a4 : fffffffffbad8000 [ 2.538466] a5 : 0000003ff449e93a a6 : 0000000000000000 a7 : 0000000000000000 [ 2.539511] s2 : 0000000000000000 s3 : 0000003ff448412c s4 : 0000000000000010 [ 2.541260] s5 : 0000000000000016 s6 : 00000000000d0a30 s7 : 0000003fffedba70 [ 2.542152] s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000003fffedb960 [ 2.543335] s11: 0000000000000000 t3 : 0000000000000000 t4 : 0000003fffedb8a0 [ 2.544471] t5 : 0000000000000000 t6 : 0000000000000000 [ 2.545730] status: 0000000000004020 badaddr: 00000000464c457f cause: 0000000000000002 [ 2.549867] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 [ 2.551267] CPU: 0 PID: 1 Comm: init Not tainted 5.8.0-rc1-dirty #13 [ 2.552061] Call Trace: [ 2.552626] [<ffffffe00020374a>] walk_stackframe+0x0/0xc4 [ 2.553486] [<ffffffe0002039f4>] show_stack+0x40/0x4c [ 2.553995] [<ffffffe00054a6ae>] dump_stack+0x7a/0x98 [ 2.554615] [<ffffffe00020b9b8>] panic+0x114/0x2f4 [ 2.555395] [<ffffffe00020ebd6>] do_exit+0x89c/0x8c2 [ 2.555949] [<ffffffe00020f930>] do_group_exit+0x3a/0x90 [ 2.556715] [<ffffffe000219e08>] get_signal+0xe2/0x6e6 [ 2.557388] [<ffffffe000202d72>] do_notify_resume+0x6a/0x37a [ 2.558089] [<ffffffe000201c16>] ret_from_exception+0x0/0xc "ra:0x3ff449e954" is the return address of "call _mcount" in the prologue of __vdso_gettimeofday(). Without proper relocate, pc jmp to 0x0000003ff449e000 (vdso map base) with a illegal instruction trap. The solution comes from arch/arm64/kernel/vdso/Makefile: CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os $(CC_FLAGS_SCS) - CC_FLAGS_SCS is ShadowCallStack feature in Clang and only implemented for arm64, no use for riscv. Fixes: ad5d1122 ("riscv: use vDSO common flow to reduce the latency of the time-related functions") Cc: stable@vger.kernel.org Signed-off-by: NGuo Ren <guoren@linux.alibaba.com> Reviewed-by: NVincent Chen <vincent.chen@sifive.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Vincent Chen 提交于
Add extern declarations for vDSO time-related functions to notify the compiler these functions will be used in somewhere to avoid "no previous prototype" compile warning. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NVincent Chen <vincent.chen@sifive.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Vincent Chen 提交于
The time related vDSO functions use a variable, vdso_data, to access the vDSO data page to get the system time information. Because the vdso_data for CFLAGS_vgettimeofday.o is an external variable defined in vdso.o, the CFLAGS_vgettimeofday.o should be compiled with -fPIC to ensure that vdso_data is addressable. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NVincent Chen <vincent.chen@sifive.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Sai Prakash Ranjan 提交于
QCOM KRYO{3,4}XX silver/LITTLE CPU cores are based on Cortex-A55 and are SSB safe, hence add them to SSB safelist -> arm64_ssb_cpus[]. Reported-by: NStephen Boyd <swboyd@chromium.org> Signed-off-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: NDouglas Anderson <dianders@chromium.org> Link: https://lore.kernel.org/r/20200625103123.7240-1-saiprakash.ranjan@codeaurora.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
- 25 6月, 2020 4 次提交
-
-
由 Jiping Ma 提交于
A 32-bit perf querying the registers of a compat task using REGS_ABI_32 will receive zeroes from w15, when it expects to find the PC. Return the PC value for register dwarf register 15 when returning register values for a compat task to perf. Cc: <stable@vger.kernel.org> Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NJiping Ma <jiping.ma2@windriver.com> Link: https://lore.kernel.org/r/1589165527-188401-1-git-send-email-jiping.ma2@windriver.com [will: Shuffled code and added a comment] Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Peter Zijlstra 提交于
vmlinux.o: warning: objtool: exc_invalid_op()+0x47: call to probe_kernel_read() leaves .noinstr.text section Since we use UD2 as a short-cut for 'CALL __WARN', treat it as such. Have the bare exception handler do the report_bug() thing. Fixes: 15a416e8 ("x86/entry: Treat BUG/WARN as NMI-like entries") Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Link: https://lkml.kernel.org/r/20200622114713.GE577403@hirez.programming.kicks-ass.net
-
由 Peter Zijlstra 提交于
Marco crashed in bad_iret with a Clang11/KCSAN build due to overflowing the stack. Now that we run C code on it, expand it to a full page. Suggested-by: NAndy Lutomirski <luto@amacapital.net> Reported-by: NMarco Elver <elver@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NLai Jiangshan <jiangshanlai@gmail.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200618144801.819246178@infradead.org
-
由 Peter Zijlstra 提交于
vmlinux.o: warning: objtool: fixup_bad_iret()+0x8e: call to memcpy() leaves .noinstr.text section Worse, when KASAN there is no telling what memcpy() actually is. Force the use of __memcpy() which is our assmebly implementation. Reported-by: NMarco Elver <elver@google.com> Suggested-by: NMarco Elver <elver@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200618144801.760070502@infradead.org
-
- 24 6月, 2020 6 次提交
-
-
由 Sai Prakash Ranjan 提交于
QCOM KRYO{3,4}XX silver/LITTLE CPU cores are based on Cortex-A55 and are meltdown safe, hence add them to kpti_safe_list[]. Signed-off-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Link: https://lore.kernel.org/r/20200624123406.3472-1-saiprakash.ranjan@codeaurora.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Jean-Philippe Brucker 提交于
Some ftrace features are broken since commit 714a8d02 ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI"). For example the function_graph tracer: $ echo function_graph > /sys/kernel/debug/tracing/current_tracer [ 36.107016] WARNING: CPU: 0 PID: 115 at kernel/trace/ftrace.c:2691 ftrace_modify_all_code+0xc8/0x14c When ftrace_modify_graph_caller() attempts to write a branch at ftrace_graph_call, it finds the "BTI J" instruction inserted by SYM_INNER_LABEL() instead of a NOP, and aborts. It turns out we don't currently need the BTI landing pads inserted by SYM_INNER_LABEL: * ftrace_call and ftrace_graph_call are only used for runtime patching of the active tracer. The patched code is not reached from a branch. * install_el2_stub is reached from a CBZ instruction, which doesn't change PSTATE.BTYPE. * __guest_exit is reached from B instructions in the hyp-entry vectors, which aren't subject to BTI checks either. Remove the BTI annotation from SYM_INNER_LABEL. Fixes: 714a8d02 ("arm64: asm: Override SYM_FUNC_START when building the kernel with BTI") Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Reviewed-by: NMark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20200624112253.1602786-1-jean-philippe@linaro.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Alexander Popov 提交于
Don't use gcc plugins for building arch/arm64/kernel/vdso/vgettimeofday.c to avoid unneeded instrumentation. Signed-off-by: NAlexander Popov <alex.popov@linux.com> Link: https://lore.kernel.org/r/20200624123330.83226-4-alex.popov@linux.comSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Will Deacon 提交于
Commit 87676cfc ("arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline") unconditionally passes the '--no-eh-frame-hdr' option to the linker when building the native vDSO in an attempt to prevent generation of the .eh_frame_hdr section, the presence of which has been implicated in segfaults originating from the libgcc unwinder. Unfortunately, not all versions of binutils support this option, which has been shown to cause build failures in linux-next: | CALL scripts/atomic/check-atomics.sh | CALL scripts/checksyscalls.sh | LD arch/arm64/kernel/vdso/vdso.so.dbg | ld: unrecognized option '--no-eh-frame-hdr' | ld: use the --help option for usage information | arch/arm64/kernel/vdso/Makefile:64: recipe for target | 'arch/arm64/kernel/vdso/vdso.so.dbg' failed | make[1]: *** [arch/arm64/kernel/vdso/vdso.so.dbg] Error 1 | arch/arm64/Makefile:175: recipe for target 'vdso_prepare' failed | make: *** [vdso_prepare] Error 2 Only link the vDSO with '--no-eh-frame-hdr' when the linker supports it. If we end up with the section due to linker defaults, the absence of CFI information in the sigreturn trampoline will prevent the unwinder from breaking. Link: https://lore.kernel.org/r/7a7e31a8-9a7b-2428-ad83-2264f20bdc2d@hisilicon.com Fixes: 87676cfc ("arm64: vdso: Disable dwarf unwinding through the sigreturn trampoline") Reported-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Tested-by: NJon Hunter <jonathanh@nvidia.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 yu kuai 提交于
if of_find_device_by_node() succeed, imx6q_suspend_init() doesn't have a corresponding put_device(). Thus add a jump target to fix the exception handling for this function implementation. Signed-off-by: Nyu kuai <yukuai3@huawei.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 yu kuai 提交于
if of_find_device_by_node() succeed, imx_suspend_alloc_ocram() doesn't have a corresponding put_device(). Thus add a jump target to fix the exception handling for this function implementation. Fixes: 1579c7b9 ("ARM: imx53: Set DDR pins to high impedance when in suspend to RAM.") Signed-off-by: Nyu kuai <yukuai3@huawei.com> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
- 23 6月, 2020 16 次提交
-
-
由 Mark Brown 提交于
Versions of binutils prior to 2.33.1 don't understand the ELF notes that are added by modern compilers to indicate the PAC and BTI options used to build the code. This causes them to emit large numbers of warnings in the form: aarch64-linux-gnu-nm: warning: .tmp_vmlinux.kallsyms2: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0000000 during the kernel build which is currently causing quite a bit of disruption for automated build testing using clang. In commit 15cd0e67 (arm64: Kconfig: ptrauth: Add binutils version check to fix mismatch) we added a dependency on binutils to avoid this issue when building with versions of GCC that emit the notes but did not do so for clang as it was believed that the existing check for .cfi_negate_ra_state was already requiring a new enough binutils. This does not appear to be the case for some versions of binutils (eg, the binutils in Debian 10) so instead refactor so we require a new enough GNU binutils in all cases other than when we are using an old GCC version that does not emit notes. Other, more exotic, combinations of tools are possible such as using clang, lld and gas together are possible and may have further problems but rather than adding further version checks it looks like the most robust thing will be to just test that we can build cleanly with the configured tools but that will require more review and discussion so do this for now to address the immediate problem disrupting build testing. Reported-by: NKernelCI <bot@kernelci.org> Reported-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NMark Brown <broonie@kernel.org> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1054 Link: https://lore.kernel.org/r/20200619123550.48098-1-broonie@kernel.orgSigned-off-by: NWill Deacon <will@kernel.org>
-
由 Will Deacon 提交于
The sigreturn code in the compat vDSO is unused. Remove it. Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Will Deacon 提交于
The 32-bit sigreturn trampoline in the compat sigpage matches the binary representation of the arch/arm/ sigpage exactly. This is important for debuggers (e.g. GDB) and unwinders (e.g. libunwind) since they rely on matching the instruction sequence in order to identify that they are unwinding through a signal. The same cannot be said for the sigreturn trampoline in the compat vDSO, which defeats the unwinder heuristics and instead attempts to use unwind directives for the unwinding. This is in contrast to arch/arm/, which never uses the vDSO for sigreturn. Ensure compatibility with arch/arm/ and existing unwinders by always using the sigpage for the sigreturn trampoline, regardless of the presence of the compat vDSO. Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Will Deacon 提交于
In preparation for removing the signal trampoline from the compat vDSO, allow the sigpage and the compat vDSO to co-exist. For the moment the vDSO signal trampoline will still be used when built. Subsequent patches will move to the sigpage consistently. Acked-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Will Deacon 提交于
Commit 7e9f5e66 ("arm64: vdso: Add --eh-frame-hdr to ldflags") results in a .eh_frame_hdr section for the vDSO, which in turn causes the libgcc unwinder to unwind out of signal handlers using the .eh_frame information populated by our .cfi directives. In conjunction with a4eb355a ("arm64: vdso: Fix CFI directives in sigreturn trampoline"), this has been shown to cause segmentation faults originating from within the unwinder during thread cancellation: | Thread 14 "virtio-net-rx" received signal SIGSEGV, Segmentation fault. | 0x0000000000435e24 in uw_frame_state_for () | (gdb) bt | #0 0x0000000000435e24 in uw_frame_state_for () | #1 0x0000000000436e88 in _Unwind_ForcedUnwind_Phase2 () | #2 0x00000000004374d8 in _Unwind_ForcedUnwind () | #3 0x0000000000428400 in __pthread_unwind (buf=<optimized out>) at unwind.c:121 | #4 0x0000000000429808 in __do_cancel () at ./pthreadP.h:304 | #5 sigcancel_handler (sig=32, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:200 | #6 sigcancel_handler (sig=<optimized out>, si=0xffff33c743f0, ctx=<optimized out>) at nptl-init.c:165 | #7 <signal handler called> | #8 futex_wait_cancelable (private=0, expected=0, futex_word=0x3890b708) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 After considerable bashing of heads, it appears that our CFI directives for unwinding out of the sigreturn trampoline are only processed by libgcc when both a .eh_frame_hdr section is present *and* the mysterious NOP is covered by an entry in .eh_frame. With both of these now in place, it has highlighted that our CFI directives are not comprehensive enough to restore the stack pointer of the interrupted context. This results in libgcc falling back to an arm64-specific unwinder after computing a bogus PC value from the unwind tables. The unwinder promptly dereferences this bogus address in an attempt to see if the pointed-to instruction sequence looks like the sigreturn trampoline. Restore the old unwind behaviour, which relied solely on heuristics in the unwinder, by removing the .eh_frame_hdr section from the vDSO and commenting out the insufficient CFI directives for now. Add comments to explain the current, miserable state of affairs. Cc: Tamas Zsoldos <tamas.zsoldos@arm.com> Cc: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Kiss <daniel.kiss@arm.com> Acked-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: NArd Biesheuvel <ardb@kernel.org> Reported-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Christian Borntraeger 提交于
When specifying insanely large debug buffers a kernel warning is printed. The debug code does handle the error gracefully, though. Instead of duplicating the check let us silence the warning to avoid crashes when panic_on_warn is used. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Vasily Gorbik 提交于
Currently if early_pgm_check_handler is called it ends up in pgm check loop. The problem is that early_pgm_check_handler is instrumented by KASAN but executed without DAT flag enabled which leads to addressing exception when KASAN checks try to access shadow memory. Fix that by executing early handlers with DAT flag on under KASAN as expected. Reported-and-tested-by: NAlexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Sven Schnelle 提交于
When single stepping an svc instruction on s390, the kernel is entered with a PER program check interruption. The program check handler than jumps to the system call handler by reloading the PSW. The code didn't set GPR13 to the thread pointer in struct task_struct. This made the kernel access invalid memory while trying to fetch the syscall function address. Fix this by always assigned GPR13 after .Lsysc_per. Fixes: 0b0ed657 ("s390: remove critical section cleanup from entry.S") Reported-and-tested-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NSven Schnelle <svens@linux.ibm.com> Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
-
由 Sean Christopherson 提交于
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marcelo Tosatti 提交于
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Without TSC scaling, the current kernel interface will either return an error (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode. This change enables the following TSC tolerance check to accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default). /* * Compute the variation in TSC rate which is acceptable * within the range of tolerance and decide if the * rate being applied is within that bounds of the hardware * rate. If so, no scaling or compensation need be done. */ thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) { pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); use_scaling = 1; } NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616114741.GA298183@fuller.cnet> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
Only MSR address range 0x800 through 0x8ff is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode. Fixes: 0105d1a5 ("KVM: x2apic interface to lapic") Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Frieder Schrempf 提交于
The WDOG_ANY signal is connected to the RESET_IN signal of the SoM and baseboard. It is currently configured as push-pull, which means that if some external device like a programmer wants to assert the RESET_IN signal by pulling it to ground, it drives against the high level WDOG_ANY output of the SoC. To fix this we set the WDOG_ANY signal to open-drain configuration. That way we make sure that the RESET_IN can be asserted by the watchdog as well as by external devices. Fixes: 1ea4b76c ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: NFrieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Frieder Schrempf 提交于
The watchdog's WDOG_ANY signal is used to trigger a POR of the SoC, if a soft reset is issued. As the SoM hardware connects the WDOG_ANY and the POR signals, the watchdog node itself and the pin configuration should be part of the common SoM devicetree. Let's move it from the baseboard's devicetree to its proper place. Fixes: 1ea4b76c ("ARM: dts: imx6ul-kontron-n6310: Add Kontron i.MX6UL N6310 SoM and boards") Cc: stable@vger.kernel.org Signed-off-by: NFrieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Sean Christopherson 提交于
Remove support for context switching between the guest's and host's desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't required for correct functionality, e.g. KVM intercepts reads and writes to the MSR, and the latency effects of the settings controlled by the MSR are not architecturally visible. As a general rule, KVM should not allow the guest to control power management settings unless explicitly enabled by userspace, e.g. see KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 can improve the performance of SMT siblings. A devious guest could disable C0.2 so as to improve the performance of their workloads at the detriment to workloads running in the host or on other VMs. Wholesale removal of UMWAIT_CONTROL context switching also fixes a race condition where updates from the host may cause KVM to enter the guest with the incorrect value. Because updates are are propagated to all CPUs via IPI (SMP function callback), the value in hardware may be stale with respect to the cached value and KVM could enter the guest with the wrong value in hardware. As above, the guest can't observe the bad value, but it's a weird and confusing wart in the implementation. Removal also fixes the unnecessary usage of VMX's atomic load/store MSR lists. Using the lists is only necessary for MSRs that are required for correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on old hardware, or for MSRs that need to-the-uop precision, e.g. perf related MSRs. For UMWAIT_CONTROL, the effects are only visible in the kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in vcpu_vmx_run(). Using the atomic lists is undesirable as they are more expensive than direct RDMSR/WRMSR. Furthermore, even if giving the guest control of the MSR is legitimate, e.g. in pass-through scenarios, it's not clear that the benefits would outweigh the overhead. E.g. saving and restoring an MSR across a VMX roundtrip costs ~250 cycles, and if the guest diverged from the host that cost would be paid on every run of the guest. In other words, if there is a legitimate use case then it should be enabled by a new per-VM capability. Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can correctly expose other WAITPKG features to the guest, e.g. TPAUSE, UMWAIT and UMONITOR. Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Cc: Jingqi Liu <jingqi.liu@intel.com> Cc: Tao Xu <tao3.xu@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and we don't use the value in 'real_gfn' as a GFN, we do real_gfn = gpa_to_gfn(real_gfn); instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust your eyes', but let's fix it for good. No functional change intended. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200622151435.752560-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-