- 27 8月, 2022 2 次提交
-
-
由 Stephane Eranian 提交于
Existing code was generating bogus counts for the SNB IMC bandwidth counters: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000327813 1,024.03 MiB uncore_imc/data_reads/ 1.000327813 20.73 MiB uncore_imc/data_writes/ 2.000580153 261,120.00 MiB uncore_imc/data_reads/ 2.000580153 23.28 MiB uncore_imc/data_writes/ The problem was introduced by commit: 07ce734d ("perf/x86/intel/uncore: Clean up client IMC") Where the read_counter callback was replace to point to the generic uncore_mmio_read_counter() function. The SNB IMC counters are freerunnig 32-bit counters laid out contiguously in MMIO. But uncore_mmio_read_counter() is using a readq() call to read from MMIO therefore reading 64-bit from MMIO. Although this is okay for the uncore_perf_event_update() function because it is shifting the value based on the actual counter width to compute a delta, it is not okay for the uncore_pmu_event_start() which is simply reading the counter and therefore priming the event->prev_count with a bogus value which is responsible for causing bogus deltas in the perf stat command above. The fix is to reintroduce the custom callback for read_counter for the SNB IMC PMU and use readl() instead of readq(). With the change the output of perf stat is back to normal: $ perf stat -a -I 1000 -e uncore_imc/data_reads/,uncore_imc/data_writes/ 1.000120987 296.94 MiB uncore_imc/data_reads/ 1.000120987 138.42 MiB uncore_imc/data_writes/ 2.000403144 175.91 MiB uncore_imc/data_reads/ 2.000403144 68.50 MiB uncore_imc/data_writes/ Fixes: 07ce734d ("perf/x86/intel/uncore: Clean up client IMC") Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NKan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20220803160031.1379788-1-eranian@google.com
-
由 Mikulas Patocka 提交于
There are several places in the kernel where wait_on_bit is not followed by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read). On architectures with weak memory ordering, it may happen that memory accesses that follow wait_on_bit are reordered before wait_on_bit and they may return invalid data. Fix this class of bugs by introducing a new function "test_bit_acquire" that works like test_bit, but has acquire memory ordering semantics. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Acked-by: NWill Deacon <will@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 8月, 2022 2 次提交
-
-
由 Borislav Petkov 提交于
Mark both the function prototype and definition as noreturn in order to prevent the compiler from doing transformations which confuse objtool like so: vmlinux.o: warning: objtool: sme_enable+0x71: unreachable instruction This triggers with gcc-12. Add it and sev_es_terminate() to the objtool noreturn tracking array too. Sort it while at it. Suggested-by: NMichael Matz <matz@suse.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220824152420.20547-1-bp@alien8.de
-
由 Lukas Bulwahn 提交于
Commit c70727a5 ("xen: allow more than 512 GB of RAM for 64 bit pv-domains") from July 2015 replaces the config XEN_MAX_DOMAIN_MEMORY with a new config XEN_512GB, but misses to adjust arch/x86/configs/xen.config. As XEN_512GB defaults to yes, there is no need to explicitly set any config in xen.config. Just remove setting the obsolete config XEN_MAX_DOMAIN_MEMORY. Signed-off-by: NLukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: NJuergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20220817044333.22310-1-lukas.bulwahn@gmail.comSigned-off-by: NJuergen Gross <jgross@suse.com>
-
- 24 8月, 2022 3 次提交
-
-
由 Tom Lendacky 提交于
When running identity-mapped and depending on the kernel configuration, it is possible that the compiler uses jump tables when generating code for cc_platform_has(). This causes a boot failure because the jump table uses un-mapped kernel virtual addresses, not identity-mapped addresses. This has been seen with CONFIG_RETPOLINE=n. Similar to sme_encrypt_kernel(), use an open-coded direct check for the status of SNP rather than trying to eliminate the jump table. This preserves any code optimization in cc_platform_has() that can be useful post boot. It also limits the changes to SEV-specific files so that future compiler features won't necessarily require possible build changes just because they are not compatible with running identity-mapped. [ bp: Massage commit message. ] Fixes: 5e5ccff6 ("x86/sev: Add helper for validating pages in early enc attribute changes") Reported-by: NSean Christopherson <seanjc@google.com> Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 5.19.x Link: https://lore.kernel.org/all/YqfabnTRxFSM+LoX@google.com/
-
由 Michael Roth 提交于
In some cases, bootloaders will leave boot_params->cc_blob_address uninitialized rather than zeroing it out. This field is only meant to be set by the boot/compressed kernel in order to pass information to the uncompressed kernel when SEV-SNP support is enabled. Therefore, there are no cases where the bootloader-provided values should be treated as anything other than garbage. Otherwise, the uncompressed kernel may attempt to access this bogus address, leading to a crash during early boot. Normally, sanitize_boot_params() would be used to clear out such fields but that happens too late: sev_enable() may have already initialized it to a valid value that should not be zeroed out. Instead, have sev_enable() zero it out unconditionally beforehand. Also ensure this happens for !CONFIG_AMD_MEM_ENCRYPT as well by also including this handling in the sev_enable() stub function. [ bp: Massage commit message and comments. ] Fixes: b190a043 ("x86/sev: Add SEV-SNP feature detection/setup") Reported-by: NJeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: watnuss@gmx.de Signed-off-by: NMichael Roth <michael.roth@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216387 Link: https://lore.kernel.org/r/20220823160734.89036-1-michael.roth@amd.com
-
由 Tony Luck 提交于
Note1: Model 0xB7 already claimed the "no suffix" #define for a regular client part, so add (yet another) suffix "S" to distinguish this new part from the earlier one. Note2: the RAPTORLAKE* and ALDERLAKE* processors are very similar from a software enabling point of view. There are no known features that have model-specific enabling and also differ between the two. In other words, every single place that list *one* or more RAPTORLAKE* or ALDERLAKE* processors should list all of them. Note3: This is being merged before there is an in-tree user. Merging this provides an "anchor" so that the different folks can update their subsystems (like perf) in parallel to use this define and test it. [ dhansen: add a note about why this has no in-tree users yet ] Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220823174819.223941-1-tony.luck@intel.com
-
- 22 8月, 2022 1 次提交
-
-
由 Nick Desaulniers 提交于
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0. The minimum supported versions of these tools for the build according to Documentation/process/changes.rst are 5.1 and 11.0.0 respectively. Remove the feature detection script, Kconfig option, and clean up some fallback code that is no longer supported. The removed script was also testing for a GCC specific bug that was fixed in the 4.7 release. Also remove workarounds for bpftrace using clang older than 9.0.0, since other BPF backend fixes are required at this point. Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/ Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637Acked-by: NBorislav Petkov <bp@suse.de> Suggested-by: NMasahiro Yamada <masahiroy@kernel.org> Suggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NNathan Chancellor <nathan@kernel.org> Reviewed-by: NAlexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 8月, 2022 1 次提交
-
-
由 Chen Zhongjin 提交于
When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176. If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address. By adding the base address ftrace_{regs_}caller with the offset *ip - ops->trampoline*, we can get the correct address to find the ORC entry. Also change "caller" to "tramp_addr" to make variable name conform to its content. [ mingo: Clarified the changelog a bit. ] Fixes: 6be7fa3c ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Reviewed-by: NSteven Rostedt (Google) <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com
-
- 20 8月, 2022 4 次提交
-
-
由 Kan Liang 提交于
According to the latest event list, the LOAD_LATENCY PEBS event only works on the GP counter 0 and 1 for ADL and RPL. Update the pebs event constraints table. Fixes: f83d2f91 ("perf/x86/intel: Add Alder Lake Hybrid support") Reported-by: NAmmy Yi <ammy.yi@intel.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220818184429.2355857-1-kan.liang@linux.intel.com
-
由 Stephane Eranian 提交于
With the existing code in store_latency_data(), the memory operation (mem_op) returned to the user is always OP_LOAD where in fact, it should be OP_STORE. This comes from the fact that the function is simply grabbing the information from a data source map which covers only load accesses. Intel 12th gen CPU offers precise store sampling that captures both the data source and latency. Therefore it can use the data source mapping table but must override the memory operation to reflect stores instead of loads. Fixes: 61b985e3 ("perf/x86/intel: Add perf core PMU support for Sapphire Rapids") Signed-off-by: NStephane Eranian <eranian@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220818054613.1548130-1-eranian@google.com
-
由 Peter Zijlstra 提交于
The SDM explicitly states that PEBS Baseline implies Extended PEBS. For cpu model forward compatibility (e.g. on ICX, SPR, ADL), it's safe to stop doing FMS table thing such as setting pebs_capable and PMU_FL_PEBS_ALL since it's already set in the intel_ds_init(). The Goldmont Plus is the only platform which supports extended PEBS but doesn't have Baseline. Keep the status quo. Reported-by: NLike Xu <likexu@tencent.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NKan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220816114057.51307-1-likexu@tencent.com
-
由 Kan Liang 提交于
On the platform with Arch LBR, the HW raw branch type encoding may leak to the perf tool when the SAVE_TYPE option is not set. In the intel_pmu_store_lbr(), the HW raw branch type is stored in lbr_entries[].type. If the SAVE_TYPE option is set, the lbr_entries[].type will be converted into the generic PERF_BR_* type in the intel_pmu_lbr_filter() and exposed to the user tools. But if the SAVE_TYPE option is NOT set by the user, the current perf kernel doesn't clear the field. The HW raw branch type leaks. There are two solutions to fix the issue for the Arch LBR. One is to clear the field if the SAVE_TYPE option is NOT set. The other solution is to unconditionally convert the branch type and expose the generic type to the user tools. The latter is implemented here, because - The branch type is valuable information. I don't see a case where you would not benefit from the branch type. (Stephane Eranian) - Not having the branch type DOES NOT save any space in the branch record (Stephane Eranian) - The Arch LBR HW can retrieve the common branch types from the LBR_INFO. It doesn't require the high overhead SW disassemble. Fixes: 47125db2 ("perf/x86/intel/lbr: Support Architectural LBR") Reported-by: NStephane Eranian <eranian@google.com> Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20220816125612.2042397-1-kan.liang@linux.intel.com
-
- 19 8月, 2022 8 次提交
-
-
由 Aaron Lu 提交于
Commit c164fbb4("x86/mm: thread pgprot_t through init_memory_mapping()") mistakenly used __pgprot() which doesn't respect __default_kernel_pte_mask when setting PUD mapping. Fix it by only setting the one bit we actually need (PSE) and leaving the other bits (that have been properly masked) alone. Fixes: c164fbb4 ("x86/mm: thread pgprot_t through init_memory_mapping()") Signed-off-by: NAaron Lu <aaron.lu@intel.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Peter Zijlstra 提交于
Turns out that i386 doesn't unconditionally have LFENCE, as such the loop in __FILL_RETURN_BUFFER isn't actually speculation safe on such chips. Fixes: ba6e31af ("x86/speculation: Add LFENCE to RSB fill sequence") Reported-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Yv9tj9vbQ9nNlXoY@worktop.programming.kicks-ass.net
-
由 Peter Zijlstra 提交于
Commit 2b129932 ("x86/speculation: Add RSB VM Exit protections") made a right mess of the RSB stuffing, rewrite the whole thing to not suck. Thanks to Andrew for the enlightening comment about Post-Barrier RSB things so we can make this code less magical. Cc: stable@vger.kernel.org Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/YvuNdDWoUZSBjYcm@worktop.programming.kicks-ass.net
-
由 Josh Poimboeuf 提交于
The following BUG was reported: traps: Missing ENDBR: andw_ax_dx+0x0/0x10 [kvm] ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/traps.c:253! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <TASK> asm_exc_control_protection+0x2b/0x30 RIP: 0010:andw_ax_dx+0x0/0x10 [kvm] Code: c3 cc cc cc cc 0f 1f 44 00 00 66 0f 1f 00 48 19 d0 c3 cc cc cc cc 0f 1f 40 00 f3 0f 1e fa 20 d0 c3 cc cc cc cc 0f 1f 44 00 00 <66> 0f 1f 00 66 21 d0 c3 cc cc cc cc 0f 1f 40 00 66 0f 1f 00 21 d0 ? andb_al_dl+0x10/0x10 [kvm] ? fastop+0x5d/0xa0 [kvm] x86_emulate_insn+0x822/0x1060 [kvm] x86_emulate_instruction+0x46f/0x750 [kvm] complete_emulated_mmio+0x216/0x2c0 [kvm] kvm_arch_vcpu_ioctl_run+0x604/0x650 [kvm] kvm_vcpu_ioctl+0x2f4/0x6b0 [kvm] ? wake_up_q+0xa0/0xa0 The BUG occurred because the ENDBR in the andw_ax_dx() fastop function had been incorrectly "sealed" (converted to a NOP) by apply_ibt_endbr(). Objtool marked it to be sealed because KVM has no compile-time references to the function. Instead KVM calculates its address at runtime. Prevent objtool from annotating fastop functions as sealable by creating throwaway dummy compile-time references to the functions. Fixes: 6649fa87 ("x86/ibt,kvm: Add ENDBR to fastops") Reported-by: NPengfei Xu <pengfei.xu@intel.com> Debugged-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Message-Id: <0d4116f90e9d0c1b754bb90c585e6f0415a1c508.1660837839.git.jpoimboe@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Josh Poimboeuf 提交于
SETCC_ALIGN and FOP_ALIGN are both 16. Remove the special casing for FOP_SETCC() and just make it a normal fastop. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Josh Poimboeuf 提交于
Add a macro which prevents a function from getting sealed if there are no compile-time references to it. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chao Peng 提交于
The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chao Peng 提交于
KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM internally used (invisible to userspace) and avoids confusion to future private slots that can have different meaning. Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com> Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 8月, 2022 1 次提交
-
-
由 Pawan Gupta 提交于
Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: NAndrew Cooper <andrew.cooper3@citrix.com> Suggested-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
-
- 17 8月, 2022 1 次提交
-
-
由 Linus Torvalds 提交于
The exception for the "unaligned access at the end of the page, next page not mapped" never happens, but the fixup code ends up causing trouble for compilers to optimize well. clang in particular ends up seeing it being in the middle of a loop, and tries desperately to optimize the exception fixup code that is never really reached. The simple solution is to just move all the fixups into the exception handler itself, which moves it all out of the hot case code, and means that the compiler never sees it or needs to worry about it. Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 16 8月, 2022 1 次提交
-
-
由 Juergen Gross 提交于
Commit c89191ce ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") missed one use case of SWAPGS in entry_INT80_compat(). Removing of the SWAPGS macro led to asm just using "swapgs", as it is accepting instructions in capital letters, too. This in turn leads to splats in Xen PV guests like: [ 36.145223] general protection fault, maybe for address 0x2d: 0000 [#1] PREEMPT SMP NOPTI [ 36.145794] CPU: 2 PID: 1847 Comm: ld-linux.so.2 Not tainted 5.19.1-1-default #1 \ openSUSE Tumbleweed f3b44bfb672cdb9f235aff53b57724eba8b9411b [ 36.146608] Hardware name: HP ProLiant ML350p Gen8, BIOS P72 11/14/2013 [ 36.148126] RIP: e030:entry_INT80_compat+0x3/0xa3 Fix that by open coding this single instance of the SWAPGS macro. Fixes: c89191ce ("x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS") Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJan Beulich <jbeulich@suse.com> Cc: <stable@vger.kernel.org> # 5.19 Link: https://lore.kernel.org/r/20220816071137.4893-1-jgross@suse.com
-
- 15 8月, 2022 1 次提交
-
-
由 Jan Beulich 提交于
After commit ID in the Fixes: tag, pat_enabled() returns false (because of PAT initialization being suppressed in the absence of MTRRs being announced to be available). This has become a problem: the i915 driver now fails to initialize when running PV on Xen (i915_gem_object_pin_map() is where I located the induced failure), and its error handling is flaky enough to (at least sometimes) result in a hung system. Yet even beyond that problem the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses would have been quite a bit less optimal than possible. Arrange for the function to return true in such environments, without undermining the rest of PAT MSR management logic considering PAT to be disabled: specifically, no writes to the PAT MSR should occur. For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before). [ bp: This is the "small fix" variant for stable. It'll get replaced with a proper PAT and MTRR detection split upstream but that is too involved for a stable backport. - additional touchups to commit msg. Use cpu_feature_enabled(). ] Fixes: bdd8b6c9 ("drm/i915: replace X86_FEATURE_PAT with pat_enabled()") Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NIngo Molnar <mingo@kernel.org> Cc: <stable@vger.kernel.org> Cc: Juergen Gross <jgross@suse.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/9385fa60-fa5d-f559-a137-6608408f88b0@suse.com
-
- 14 8月, 2022 1 次提交
-
-
由 Nadav Amit 提交于
When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is performed if (ZF == 1 or SF != OF). However the kernel emulation currently uses 'and' instead of 'or'. As a result, setting a kprobe on JNG/JNLE might cause the kernel to behave incorrectly whenever the kprobe is hit. Fix by changing the 'and' to 'or'. Fixes: 6256e668 ("x86/kprobes: Use int3 instead of debug trap for single-step") Signed-off-by: NNadav Amit <namit@vmware.com> Signed-off-by: NIngo Molnar <mingo@kernel.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com
-
- 12 8月, 2022 1 次提交
-
-
由 Jane Malalane 提交于
Implement support for the HVMOP_set_evtchn_upcall_vector hypercall in order to set the per-vCPU event channel vector callback on Linux and use it in preference of HVM_PARAM_CALLBACK_IRQ. If the per-VCPU vector setup is successful on BSP, use this method for the APs. If not, fallback to the global vector-type callback. Also register callback_irq at per-vCPU event channel setup to trick toolstack to think the domain is enlightened. Suggested-by: N"Roger Pau Monné" <roger.pau@citrix.com> Signed-off-by: NJane Malalane <jane.malalane@citrix.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220729070416.23306-1-jane.malalane@citrix.comSigned-off-by: NJuergen Gross <jgross@suse.com>
-
- 11 8月, 2022 13 次提交
-
-
由 Nick Desaulniers 提交于
Users of GNU ld (BFD) from binutils 2.39+ will observe multiple instances of a new warning when linking kernels in the form: ld: warning: arch/x86/boot/pmjump.o: missing .note.GNU-stack section implies executable stack ld: NOTE: This behaviour is deprecated and will be removed in a future version of the linker ld: warning: arch/x86/boot/compressed/vmlinux has a LOAD segment with RWX permissions Generally, we would like to avoid the stack being executable. Because there could be a need for the stack to be executable, assembler sources have to opt-in to this security feature via explicit creation of the .note.GNU-stack feature (which compilers create by default) or command line flag --noexecstack. Or we can simply tell the linker the production of such sections is irrelevant and to link the stack as --noexecstack. LLVM's LLD linker defaults to -z noexecstack, so this flag isn't strictly necessary when linking with LLD, only BFD, but it doesn't hurt to be explicit here for all linkers IMO. --no-warn-rwx-segments is currently BFD specific and only available in the current latest release, so it's wrapped in an ld-option check. While the kernel makes extensive usage of ELF sections, it doesn't use permissions from ELF segments. Link: https://lore.kernel.org/linux-block/3af4127a-f453-4cf7-f133-a181cce06f73@kernel.dk/ Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=ba951afb99912da01a6e8434126b8fac7aa75107 Link: https://github.com/llvm/llvm-project/issues/57009Reported-and-tested-by: NJens Axboe <axboe@kernel.dk> Suggested-by: NFangrui Song <maskray@google.com> Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sean Christopherson 提交于
Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written by host userspace, zero out the number of LBR records for a vCPU during PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of handling the check at run-time. guest_cpuid_has() is expensive due to the linear search of guest CPUID entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_ simply enumerating the same "Model" as the host causes KVM to set the number of LBR records to a non-zero value. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Turn vcpu_to_lbr_desc() and vcpu_to_lbr_records() into functions in order to provide type safety, to document exactly what they return, and to allow consuming the helpers in vmx.h. Move the definitions as necessary (the macros "reference" to_vmx() before its definition). No functional change intended. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Refresh the PMU if userspace modifies MSR_IA32_PERF_CAPABILITIES. KVM consumes the vCPU's PERF_CAPABILITIES when enumerating PEBS support, but relies on CPUID updates to refresh the PMU. I.e. KVM will do the wrong thing if userspace stuffs PERF_CAPABILITIES _after_ setting guest CPUID. Opportunistically fix a curly-brace indentation. Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Cc: Like Xu <like.xu.linux@gmail.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220727233424.2968356-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add compile-time and init-time sanity checks to ensure that the MMIO SPTE mask doesn't overlap the MMIO SPTE generation or the MMU-present bit. The generation currently avoids using bit 63, but that's as much coincidence as it is strictly necessarly. That will change in the future, as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask. Explicitly carve out the bits that are allowed in the mask so that any future shuffling of SPTE bits doesn't silently break MMIO caching (KVM has broken MMIO caching more than once due to overlapping the generation with other things). Suggested-by: NKai Huang <kai.huang@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NKai Huang <kai.huang@intel.com> Message-Id: <20220805194133.86299-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Mingwei Zhang 提交于
Rename the tracepoint function from trace_kvm_async_pf_doublefault() to trace_kvm_async_pf_repeated_fault() to make it clear, since double fault has nothing to do with this trace function. Asynchronous Page Fault (APF) is an artifact generated by KVM when it cannot find a physical page to satisfy an EPT violation. KVM uses APF to tell the guest OS to do something else such as scheduling other guest processes to make forward progress. However, when another guest process also touches a previously APFed page, KVM halts the vCPU instead of generating a repeated APF to avoid wasting cycles. Double fault (#DF) clearly has a different meaning and a different consequence when triggered. #DF requires two nested contributory exceptions instead of two page faults faulting at the same address. A prevous bug on APF indicates that it may trigger a double fault in the guest [1] and clearly this trace function has nothing to do with it. So rename this function should be a valid choice. No functional change intended. [1] https://www.spinics.net/lists/kvm/msg214957.htmlSigned-off-by: NMingwei Zhang <mizhang@google.com> Message-Id: <20220807052141.69186-1-mizhang@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Coleman Dietsch 提交于
Stop Xen timer (if it's running) prior to changing the IRQ vector and potentially (re)starting the timer. Changing the IRQ vector while the timer is still running can result in KVM injecting a garbage event, e.g. vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from a previous timer but inject the new xen.timer_virq. Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode") Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42 Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com Signed-off-by: NColeman Dietsch <dietschc@csp.edu> Reviewed-by: NSean Christopherson <seanjc@google.com> Acked-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220808190607.323899-3-dietschc@csp.edu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Coleman Dietsch 提交于
Add a check for existing xen timers before initializing a new one. Currently kvm_xen_init_timer() is called on every KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG crash when vcpu->arch.xen.timer is already set. ODEBUG: init active (active state 0) object type: hrtimer hint: xen_timer_callbac0 RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502 Call Trace: __debug_object_init debug_hrtimer_init debug_init hrtimer_init kvm_xen_init_timer kvm_xen_vcpu_set_attr kvm_arch_vcpu_ioctl kvm_vcpu_ioctl vfs_ioctl Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode") Cc: stable@vger.kernel.org Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42 Reported-by: syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com Signed-off-by: NColeman Dietsch <dietschc@csp.edu> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220808190607.323899-2-dietschc@csp.edu> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs generating #NPF(RSVD), which are reflected by the CPU into the guest as a #VC. With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access to the guest instruction stream or register state and so can't directly emulate in response to a #NPF on an emulated MMIO GPA. Disabling MMIO caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT), and those flavors of #NPF cause automatic VM-Exits, not #VC. Adjust KVM's MMIO masks to account for the C-bit location prior to doing SEV(-ES) setup, and document that dependency between adjusting the MMIO SPTE mask and SEV(-ES) setup. Fixes: b09763da ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)") Reported-by: NMichael Roth <michael.roth@amd.com> Tested-by: NMichael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Fully re-evaluate whether or not MMIO caching can be enabled when SPTE masks change; simply clearing enable_mmio_caching when a configuration isn't compatible with caching fails to handle the scenario where the masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit location, and toggle compatibility from false=>true. Snapshot the original module param so that re-evaluating MMIO caching preserves userspace's desire to allow caching. Use a snapshot approach so that enable_mmio_caching still reflects KVM's actual behavior. Fixes: 8b9e74bf ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled") Reported-by: NMichael Roth <michael.roth@amd.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: stable@vger.kernel.org Tested-by: NMichael Roth <michael.roth@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NKai Huang <kai.huang@intel.com> Message-Id: <20220803224957.1285926-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists is to initialize variables when kvm.ko is loaded, i.e. it must never be called after module initialization. Fixes: 1d0e8480 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded") Cc: stable@vger.kernel.org Reviewed-by: NKai Huang <kai.huang@intel.com> Tested-by: NMichael Roth <michael.roth@amd.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220803224957.1285926-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Michal Luczaj 提交于
The emulator mishandles LEA with register source operand. Even though such LEA is illegal, it can be encoded and fed to CPU. In which case real hardware throws #UD. The emulator, instead, returns address of x86_emulate_ctxt._regs. This info leak hurts host's kASLR. Tell the decoder that illegal LEA is not to be emulated. Signed-off-by: NMichal Luczaj <mhal@rbox.co> Message-Id: <20220729134801.1120-1-mhal@rbox.co> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Yu Zhang 提交于
kvm_fixup_and_inject_pf_error() was introduced to fixup the error code( e.g., to add RSVD flag) and inject the #PF to the guest, when guest MAXPHYADDR is smaller than the host one. When it comes to nested, L0 is expected to intercept and fix up the #PF and then inject to L2 directly if - L2.MAXPHYADDR < L0.MAXPHYADDR and - L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the same MAXPHYADDR value && L1 is using EPT for L2), instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK and PFEC_MATCH both set to 0 in vmcs02, the interception and injection may happen on all L2 #PFs. However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error() may cause the fault.async_page_fault being NOT zeroed, and later the #PF being treated as a nested async page fault, and then being injected to L1. Instead of zeroing 'fault' at the beginning of this function, we mannually set the value of 'fault.async_page_fault', because false is the value we really expect. Fixes: 89786147 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection") Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178Reported-by: NYang Lixiao <lixiao.yang@intel.com> Signed-off-by: NYu Zhang <yu.c.zhang@linux.intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-