- 12 11月, 2019 4 次提交
-
-
由 Joao Martins 提交于
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU linked list of all vCPUs that are blocked on this pCPU. Afterwards, it changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler (wakeup_handler()) is responsible to kick (unblock) any vCPU on that linked list that now has pending posted interrupts. While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear PID.SN but will also *overwrite PID.NDST to this different pCPU*. Instead of keeping it with original pCPU which vCPU had entered block phase on. This results in an issue because when a posted interrupt is delivered, as the wakeup_handler() will be executed and fail to find blocked vCPU on its per pCPU linked list of all vCPUs that are blocked on this pCPU. Which is due to the vCPU being placed on a *different* per pCPU linked list i.e. the original pCPU in which it entered block phase. The regression is introduced by commit c112b5f5 ("KVM: x86: Recompute PID.ON when clearing PID.SN"). Therefore, partially revert it and reintroduce the condition in vmx_vcpu_pi_load() responsible for avoiding changing PID.NDST when loading a blocked vCPU. Fixes: c112b5f5 ("KVM: x86: Recompute PID.ON when clearing PID.SN") Tested-by: NNathan Ni <nathan.ni@oracle.com> Co-developed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Commit 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU") introduced vmx_dy_apicv_has_pending_interrupt() in order to determine if a vCPU have a pending posted interrupt. This routine is used by kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule on pCPU instead of a vCPU doing busy loop. vmx_dy_apicv_has_pending_interrupt() determines if a vCPU has a pending posted interrupt solely based on PID.ON. However, when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause raised posted interrupts to only set bit in PID.PIR without setting PID.ON (and without sending notification vector), as depicted in VT-d manual section 5.2.3 "Interrupt-Posting Hardware Operation". Therefore, checking PID.ON is insufficient to determine if a vCPU has pending posted interrupts and instead we should also check if there is some bit set on PID.PIR if PID.SN=1. Fixes: 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU") Reviewed-by: NJagannathan Raman <jag.raman@oracle.com> Co-developed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
The Outstanding Notification (ON) bit is part of the Posted Interrupt Descriptor (PID) as opposed to the Posted Interrupts Register (PIR). The latter is a bitmap for pending vectors. Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
The three MSR lists(msrs_to_save[], emulated_msrs[] and msr_based_features[]) are global arrays of kvm.ko, which are adjusted (copy supported MSRs forward to override the unsupported MSRs) when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next installation, kvm-{intel,amd}.ko will do operations on the modified arrays with some MSRs lost and some MSRs duplicated. So define three constant arrays to hold the initial MSR lists and initialize msrs_to_save[], emulated_msrs[] and msr_based_features[] based on the constant arrays. Cc: stable@vger.kernel.org Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> [Remove now useless conditionals. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 05 11月, 2019 3 次提交
-
-
由 Michael Zhivich 提交于
The introduction of clocksource_tsc_early broke the functionality of "tsc=reliable" and "tsc=nowatchdog" command line parameters, since clocksource_tsc_early is unconditionally registered with CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list. This can cause the TSC to be declared unstable during boot: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc-early' as unstable because the skew is too large: clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d mask: ffffffff clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6 mask: ffffffffffffffff tsc: Marking TSC unstable due to clocksource watchdog The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so the watchdog differs from TSC by 0.84 seconds. This happens when HPET is not available and jiffies are used as the TSC watchdog instead and the jiffies update is not happening due to lost timer interrupts in periodic mode, which can happen e.g. with expensive debug mechanisms enabled or under massive overload conditions in virtualized environments. Before the introduction of the early TSC clocksource the command line parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around this issue. Restore the behaviour by disabling the watchdog if requested on the kernel command line. [ tglx: Clarify changelog ] Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: NMichael Zhivich <mzhivich@akamai.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
-
由 Thomas Gleixner 提交于
Cyrill reported the following crash: BUG: unable to handle page fault for address: 0000000000001ff0 #PF: supervisor read access in kernel mode RIP: 0010:get_stack_info+0xb3/0x148 It turns out that if the stack tracer is invoked before the exception stack mappings are initialized in_exception_stack() can erroneously classify an invalid address as an address inside of an exception stack: begin = this_cpu_read(cea_exception_stacks); <- 0 end = begin + sizeof(exception stacks); i.e. any address between 0 and end will be considered as exception stack address and the subsequent code will then try to derefence the resulting stack frame at a non mapped address. end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->next_sp = (unsigned long *)regs->sp; ==> Crashes due to accessing 0x1ff0 Prevent this by checking the validity of the cea_exception_stack base address and bailing out if it is zero. Fixes: afcd21da ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist") Reported-by: NCyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NCyrill Gorcunov <gorcunov@gmail.com> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de
-
由 Jan Beulich 提交于
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: bae3a8d3 ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
-
- 04 11月, 2019 1 次提交
-
-
由 Xiaochen Shen 提交于
When a mon group is being deleted, rdtgrp->flags is set to RDT_DELETED in rdtgroup_rmdir_mon() firstly. The structure of rdtgrp will be freed until rdtgrp->waitcount is dropped to 0 in rdtgroup_kn_unlock() later. During the window of deleting a mon group, if an application calls rdtgroup_mondata_show() to read mondata under this mon group, 'rdtgrp' returned from rdtgroup_kn_lock_live() is a NULL pointer when rdtgrp->flags is RDT_DELETED. And then 'rdtgrp' is passed in this path: rdtgroup_mondata_show() --> mon_event_read() --> mon_event_count(). Thus it results in NULL pointer dereference in mon_event_count(). Check 'rdtgrp' in rdtgroup_mondata_show(), and return -ENOENT immediately when reading mondata during the window of deleting a mon group. Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data") Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Reviewed-by: NTony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: pei.p.jia@intel.com Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1572326702-27577-1-git-send-email-xiaochen.shen@intel.com
-
- 31 10月, 2019 2 次提交
-
-
由 Paolo Bonzini 提交于
VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Kairui Song 提交于
Currently, kernel fails to boot on some HyperV VMs when using EFI. And it's a potential issue on all x86 platforms. It's caused by broken kernel relocation on EFI systems, when below three conditions are met: 1. Kernel image is not loaded to the default address (LOAD_PHYSICAL_ADDR) by the loader. 2. There isn't enough room to contain the kernel, starting from the default load address (eg. something else occupied part the region). 3. In the memmap provided by EFI firmware, there is a memory region starts below LOAD_PHYSICAL_ADDR, and suitable for containing the kernel. EFI stub will perform a kernel relocation when condition 1 is met. But due to condition 2, EFI stub can't relocate kernel to the preferred address, so it fallback to ask EFI firmware to alloc lowest usable memory region, got the low region mentioned in condition 3, and relocated kernel there. It's incorrect to relocate the kernel below LOAD_PHYSICAL_ADDR. This is the lowest acceptable kernel relocation address. The first thing goes wrong is in arch/x86/boot/compressed/head_64.S. Kernel decompression will force use LOAD_PHYSICAL_ADDR as the output address if kernel is located below it. Then the relocation before decompression, which move kernel to the end of the decompression buffer, will overwrite other memory region, as there is no enough memory there. To fix it, just don't let EFI stub relocate the kernel to any address lower than lowest acceptable address. [ ardb: introduce efi_low_alloc_above() to reduce the scope of the change ] Signed-off-by: NKairui Song <kasong@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: https://lkml.kernel.org/r/20191029173755.27149-6-ardb@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 10月, 2019 3 次提交
-
-
由 Kan Liang 提交于
The events in the same group don't start or stop simultaneously. Here is the ftrace when enabling event group for uncore_iio_0: # perf stat -e "{uncore_iio_0/event=0x1/,uncore_iio_0/event=0xe/}" <idle>-0 [000] d.h. 8959.064832: read_msr: a41, value b2b0b030 //Read counter reg of IIO unit0 counter0 <idle>-0 [000] d.h. 8959.064835: write_msr: a48, value 400001 //Write Ctrl reg of IIO unit0 counter0 to enable counter0. <------ Although counter0 is enabled, Unit Ctrl is still freezed. Nothing will count. We are still good here. <idle>-0 [000] d.h. 8959.064836: read_msr: a40, value 30100 //Read Unit Ctrl reg of IIO unit0 <idle>-0 [000] d.h. 8959.064838: write_msr: a40, value 30000 //Write Unit Ctrl reg of IIO unit0 to enable all counters in the unit by clear Freeze bit <------Unit0 is un-freezed. Counter0 has been enabled. Now it starts counting. But counter1 has not been enabled yet. The issue starts here. <idle>-0 [000] d.h. 8959.064846: read_msr: a42, value 0 //Read counter reg of IIO unit0 counter1 <idle>-0 [000] d.h. 8959.064847: write_msr: a49, value 40000e //Write Ctrl reg of IIO unit0 counter1 to enable counter1. <------ Now, counter1 just starts to count. Counter0 has been running for a while. Current code un-freezes the Unit Ctrl right after the first counter is enabled. The subsequent group events always loses some counter values. Implement pmu_enable and pmu_disable support for uncore, which can help to batch hardware accesses. No one uses uncore_enable_box and uncore_disable_box. Remove them. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: linux-drivers-review@eclists.intel.com Cc: linux-perf@eclists.intel.com Fixes: 087bfbb0 ("perf/x86: Add generic Intel uncore PMU support") Link: https://lkml.kernel.org/r/1572014593-31591-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kim Phillips 提交于
This saves us writing the IBS control MSR twice when disabling the event. I searched revision guides for all families since 10h, and did not find occurrence of erratum #420, nor anything remotely similar: so we isolate the secondary MSR write to family 10h only. Also unconditionally update the count mask for IBS Op implementations that have read & writeable current count (CurCnt) fields in addition to the MaxCnt field. These bits were reserved on prior implementations, and therefore shouldn't have negative impact. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: c9574fe0 ("perf/x86-ibs: Implement workaround for IBS erratum #420") Link: https://lkml.kernel.org/r/20191023150955.30292-2-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kim Phillips 提交于
The loop that reads all the IBS MSRs into *buf stopped one MSR short of reading the IbsOpData register, which contains the RipInvalid status bit. Fix the offset_max assignment so the MSR gets read, so the RIP invalid evaluation is based on what the IBS h/w output, instead of what was left in memory. Signed-off-by: NKim Phillips <kim.phillips@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: d47e8238 ("perf/x86-ibs: Take instruction pointer from ibs sample") Link: https://lkml.kernel.org/r/20191023150955.30292-1-kim.phillips@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 25 10月, 2019 1 次提交
-
-
由 Juergen Gross 提交于
Support for the kernel as Xen 32-bit PV guest will soon be removed. Issue a warning when booted as such. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
- 23 10月, 2019 2 次提交
-
-
由 Jim Mattson 提交于
If the "virtualize APIC accesses" VM-execution control is set in the VMCS, the APIC virtualization hardware is triggered when a page walk in VMX non-root mode terminates at a PTE wherein the address of the 4k page frame matches the APIC-access address specified in the VMCS. On hardware, the APIC-access address may be any valid 4k-aligned physical address. KVM's nVMX implementation enforces the additional constraint that the APIC-access address specified in the vmcs12 must be backed by a "struct page" in L1. If not, L0 will simply clear the "virtualize APIC accesses" VM-execution control in the vmcs02. The problem with this approach is that the L1 guest has arranged the vmcs12 EPT tables--or shadow page tables, if the "enable EPT" VM-execution control is clear in the vmcs12--so that the L2 guest physical address(es)--or L2 guest linear address(es)--that reference the L2 APIC map to the APIC-access address specified in the vmcs12. Without the "virtualize APIC accesses" VM-execution control in the vmcs02, the APIC accesses in the L2 guest will directly access the APIC-access page in L1. When there is no mapping whatsoever for the APIC-access address in L1, the L2 VM just loses the intended APIC virtualization. However, when the APIC-access address is mapped to an MMIO region in L1, the L2 guest gets direct access to the L1 MMIO device. For example, if the APIC-access address specified in the vmcs12 is 0xfee00000, then L2 gets direct access to L1's APIC. Since this vmcs12 configuration is something that KVM cannot faithfully emulate, the appropriate response is to exit to userspace with KVM_INTERNAL_ERROR_EMULATION. Fixes: fe3ef05c ("KVM: nVMX: Prepare vmcs02 from vmcs01 and vmcs12") Reported-by: NDan Cross <dcross@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Miaohe Lin 提交于
Guest physical APIC ID may not equal to vcpu->vcpu_id in some case. We may set the wrong physical id in avic_handle_ldr_update as we always use vcpu->vcpu_id. Get physical APIC ID from vAPIC page instead. Export and use kvm_xapic_id here and in avic_handle_apic_id_update as suggested by Vitaly. Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 10月, 2019 6 次提交
-
-
由 Paolo Bonzini 提交于
After resetting the vCPU, the kvmclock MSR keeps the previous value but it is not enabled. This can be confusing, so fix it. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 kbuild test robot 提交于
Use BUG_ON instead of a if condition followed by BUG. Generated by: scripts/coccinelle/misc/bugon.cocci Fixes: 4b526de5 ("KVM: x86: Check kvm_rebooting in kvm_spurious_fault()") CC: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Nkbuild test robot <lkp@intel.com> Signed-off-by: NJulia Lawall <julia.lawall@lip6.fr> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
Commit bf653b78 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") introduced specialized handling of specific exit-reasons that should not be raised by CPU because KVM configures VMCS such that they should never be raised. However, since commit 7396d337 ("KVM: x86: Return to userspace with internal error on unexpected exit reason"), VMX & SVM exit handlers were modified to generically handle all unexpected exit-reasons by returning to userspace with internal error. Therefore, there is no need for specialized handling of specific unexpected exit-reasons (This specialized handling also introduced inconsistency for these exit-reasons to silently skip guest instruction instead of return to userspace on internal-error). Fixes: bf653b78 ("KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit") Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
When the RDPID instruction is supported on the host, enumerate it in KVM_GET_SUPPORTED_CPUID. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Thomas Hellstrom 提交于
The platform detection VMWARE_PORT macro uses the VMWARE_HYPERVISOR_PORT definition, but expects it to be an integer. However, when it was moved to the new vmware.h include file, it was changed to be a string to better fit into the VMWARE_HYPERCALL set of macros. This obviously breaks the platform detection VMWARE_PORT functionality. Change the VMWARE_HYPERVISOR_PORT and VMWARE_HYPERVISOR_PORT_HB definitions to be integers, and use __stringify() for their stringified form when needed. Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: b4dd4f6e ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-3-thomas_os@shipmail.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Hellstrom 提交于
LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified. This was previously fixed for the VMWARE_PORT macro. Fix it also for the VMWARE_HYPERCALL macro. Suggested-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: clang-built-linux@googlegroups.com Fixes: b4dd4f6e ("Add a header file for hypercall definitions") Link: https://lkml.kernel.org/r/20191021172403.3085-2-thomas_os@shipmail.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 10月, 2019 1 次提交
-
-
由 Jiri Olsa 提交于
Jan reported failing ltp test for PT: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/tracing/pt_test/pt_test.c It looks like the reason is this new commit added in this v5.4 merge window: 38bb8d77 ("perf/x86/intel/pt: Split ToPA metadata and page layout") which did not keep the TOPA_SHIFT for entry base. Add it back. Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NJiri Olsa <jolsa@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Petlan <mpetlan@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Fixes: 38bb8d77 ("perf/x86/intel/pt: Split ToPA metadata and page layout") Link: https://lkml.kernel.org/r/20191019220726.12213-1-jolsa@kernel.org [ Minor changelog edits. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 10月, 2019 2 次提交
-
-
由 Zhenzhong Duan 提交于
When building with "EXTRA_CFLAGS=-Wall" gcc warns: arch/x86/boot/compressed/acpi.c:29:30: warning: get_cmdline_acpi_rsdp defined but not used [-Wunused-function] get_cmdline_acpi_rsdp() is only used when CONFIG_RANDOMIZE_BASE and CONFIG_MEMORY_HOTREMOVE are both enabled, so any build where one of these config options is disabled has this issue. Move the function under the same ifdef guard as the call site. [ tglx: Add context to the changelog so it becomes useful ] Fixes: 41fa1ee9 ("acpi: Ignore acpi_rsdp kernel param when the kernel has been locked down") Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1569719633-32164-1-git-send-email-zhenzhong.duan@oracle.com
-
由 Andrea Parri 提交于
Michael reported that the x86/hyperv initialization code prints the following dmesg when running in a VM on Hyper-V: [ 0.000738] Booting paravirtualized kernel on bare hardware Let the x86/hyperv initialization code set pv_info.name to "Hyper-V" so dmesg reports correctly: [ 0.000172] Booting paravirtualized kernel on Hyper-V [ tglx: Folded build fix provided by Yue ] Reported-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NAndrea Parri <parri.andrea@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NWei Liu <wei.liu@kernel.org> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Cc: YueHaibing <yuehaibing@huawei.com> Link: https://lkml.kernel.org/r/20191015103502.13156-1-parri.andrea@gmail.com
-
- 15 10月, 2019 2 次提交
-
-
由 Sean Christopherson 提交于
Check that the per-cpu cluster mask pointer has been set prior to clearing a dying cpu's bit. The per-cpu pointer is not set until the target cpu reaches smp_callin() during CPUHP_BRINGUP_CPU, whereas the teardown function, x2apic_dead_cpu(), is associated with the earlier CPUHP_X2APIC_PREPARE. If an error occurs before the cpu is awakened, e.g. if do_boot_cpu() itself fails, x2apic_dead_cpu() will dereference the NULL pointer and cause a panic. smpboot: do_boot_cpu failed(-22) to wakeup CPU#1 BUG: kernel NULL pointer dereference, address: 0000000000000008 RIP: 0010:x2apic_dead_cpu+0x1a/0x30 Call Trace: cpuhp_invoke_callback+0x9a/0x580 _cpu_up+0x10d/0x140 do_cpu_up+0x69/0xb0 smp_init+0x63/0xa9 kernel_init_freeable+0xd7/0x229 ? rest_init+0xa0/0xa0 kernel_init+0xa/0x100 ret_from_fork+0x35/0x40 Fixes: 023a6117 ("x86/apic/x2apic: Simplify cluster management") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191001205019.5789-1-sean.j.christopherson@intel.com
-
由 Roman Kagan 提交于
Now that there's Hyper-V IOMMU driver, Linux can switch to x2apic mode when supported by the vcpus. However, the apic access functions for Hyper-V enlightened apic assume xapic mode only. As a result, Linux fails to bring up secondary cpus when run as a guest in QEMU/KVM with both hv_apic and x2apic enabled. According to Michael Kelley, when in x2apic mode, the Hyper-V synthetic apic MSRs behave exactly the same as the corresponding architectural x2apic MSRs, so there's no need to override the apic accessors. The only exception is hv_apic_eoi_write, which benefits from lazy EOI when available; however, its implementation works for both xapic and x2apic modes. Fixes: 29217a47 ("iommu/hyper-v: Add Hyper-V stub IOMMU driver") Fixes: 6b48cb5f ("X86/Hyper-V: Enlighten APIC access") Suggested-by: NMichael Kelley <mikelley@microsoft.com> Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NMichael Kelley <mikelley@microsoft.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20191010123258.16919-1-rkagan@virtuozzo.com
-
- 12 10月, 2019 10 次提交
-
-
由 Kan Liang 提交于
Tiger Lake is the followon to Ice Lake. From the perspective of Intel cstate residency counters, there is nothing changed compared with Ice Lake. Share icl_cstates with Ice Lake. Update the comments for Tiger Lake. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-10-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Tiger Lake is the followon to Ice Lake. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-9-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Tiger Lake is the followon to Ice Lake. From the perspective of Intel core PMU, there is little changes compared with Ice Lake, e.g. small changes in event list. But it doesn't impact on core PMU functionality. Share the perf code with Ice Lake. The event list patch will be submitted later separately. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-8-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
There is no Core C3 C-State counter for Ice Lake. Package C8/C9/C10 C-State counters are added for Ice Lake. Introduce a new event list, icl_cstates, for Ice Lake. Update the comments accordingly. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: f08c47d1 ("perf/x86/intel/cstate: Add Icelake support") Link: https://lkml.kernel.org/r/1570549810-25049-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
PPERF and SMI_COUNT MSRs are also supported by Ice Lake desktop and server. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel cstate residency counters, there is nothing changed compared with Kaby Lake. Share hswult_cstates with Kaby Lake. Update the comments for Comet Lake. Kaby Lake is missed in the comments for some Residency Counters. Update the comments for Kaby Lake as well. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-5-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Comet Lake is the new 10th Gen Intel processor. PPERF and SMI_COUNT MSRs are also supported. The External Design Specification (EDS) is not published yet. It comes from an authoritative internal source. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-4-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Comet Lake is the new 10th Gen Intel processor. From the perspective of Intel PMU, there is nothing changed compared with Sky Lake. Share the perf code with Sky Lake. The patch has been tested on real hardware. Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/1570549810-25049-3-git-send-email-kan.liang@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steve Wahl 提交于
The kernel image map is created using PMD pages, which can include some extra space beyond what's actually needed. Round the size of the memory hole we search for up to the next PMD boundary, to be certain all of the space to be mapped is usable RAM and includes no reserved areas. Signed-off-by: NSteve Wahl <steve.wahl@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dimitri.sivanich@hpe.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jordan Borgner <mail@jordan-borgner.de> Cc: Juergen Gross <jgross@suse.com> Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/df4f49f05c0c27f108234eb93db5c613d09ea62e.1569358539.git.steve.wahl@hpe.com
-
由 Steve Wahl 提交于
Our hardware (UV aka Superdome Flex) has address ranges marked reserved by the BIOS. Access to these ranges is caught as an error, causing the BIOS to halt the system. Initial page tables mapped a large range of physical addresses that were not checked against the list of BIOS reserved addresses, and sometimes included reserved addresses in part of the mapped range. Including the reserved range in the map allowed processor speculative accesses to the reserved range, triggering a BIOS halt. Used early in booting, the page table level2_kernel_pgt addresses 1 GiB divided into 2 MiB pages, and it was set up to linearly map a full 1 GiB of physical addresses that included the physical address range of the kernel image, as chosen by KASLR. But this also included a large range of unused addresses on either side of the kernel image. And unlike the kernel image's physical address range, this extra mapped space was not checked against the BIOS tables of usable RAM addresses. So there were times when the addresses chosen by KASLR would result in processor accessible mappings of BIOS reserved physical addresses. The kernel code did not directly access any of this extra mapped space, but having it mapped allowed the processor to issue speculative accesses into reserved memory, causing system halts. This was encountered somewhat rarely on a normal system boot, and much more often when starting the crash kernel if "crashkernel=512M,high" was specified on the command line (this heavily restricts the physical address of the crash kernel, in our case usually within 1 GiB of reserved space). The solution is to invalidate the pages of this table outside the kernel image's space before the page table is activated. It fixes this problem on our hardware. [ bp: Touchups. ] Signed-off-by: NSteve Wahl <steve.wahl@hpe.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Brijesh Singh <brijesh.singh@amd.com> Cc: dimitri.sivanich@hpe.com Cc: Feng Tang <feng.tang@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jordan Borgner <mail@jordan-borgner.de> Cc: Juergen Gross <jgross@suse.com> Cc: mike.travis@hpe.com Cc: russ.anderson@hpe.com Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Cc: Zhenzhong Duan <zhenzhong.duan@oracle.com> Link: https://lkml.kernel.org/r/9c011ee51b081534a7a15065b1681d200298b530.1569358539.git.steve.wahl@hpe.com
-
- 09 10月, 2019 2 次提交
-
-
由 Tom Lendacky 提交于
It turns out that the NMI latency workaround from commit: 6d3edaae ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up being too conservative and results in the perf NMI handler claiming NMIs too easily on AMD hardware when the NMI watchdog is active. This has an impact, for example, on the hpwdt (HPE watchdog timer) module. This module can produce an NMI that is used to reset the system. It registers an NMI handler for the NMI_UNKNOWN type and relies on the fact that nothing has claimed an NMI so that its handler will be invoked when the watchdog device produces an NMI. After the referenced commit, the hpwdt module is unable to process its generated NMI if the NMI watchdog is active, because the current NMI latency mitigation results in the NMI being claimed by the perf NMI handler. Update the AMD perf NMI latency mitigation workaround to, instead, use a window of time. Whenever a PMC is handled in the perf NMI handler, set a timestamp which will act as a perf NMI window. Any NMIs arriving within that window will be claimed by perf. Anything outside that window will not be claimed by perf. The value for the NMI window is set to 100 msecs. This is a conservative value that easily covers any NMI latency in the hardware. While this still results in a window in which the hpwdt module will not receive its NMI, the window is now much, much smaller. Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 6d3edaae ("x86/perf/amd: Resolve NMI latency issues for active PMCs") Link: https://lkml.kernel.org/r/Message-ID: Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kan Liang 提交于
Comet Lake is the new 10th Gen Intel processor. Add two new CPU model numbers to the Intel family list. The CPU model numbers are not published in the SDM yet but they come from an authoritative internal source. [ bp: Touch up commit message. ] Signed-off-by: NKan Liang <kan.liang@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Cc: ak@linux.intel.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1570549810-25049-2-git-send-email-kan.liang@linux.intel.com
-
- 08 10月, 2019 1 次提交
-
-
由 Sami Tolvanen 提交于
LLVM's assembler doesn't accept the short form INL instruction: inl (%%dx) but instead insists on the output register to be explicitly specified: <inline asm>:1:7: error: invalid operand for instruction inl (%dx) ^ LLVM ERROR: Error parsing inline asm Use the full form of the instruction to fix the build. Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Reviewed-by: NKees Cook <keescook@chromium.org> Acked-by: NThomas Hellstrom <thellstrom@vmware.com> Cc: clang-built-linux@googlegroups.com Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: virtualization@lists.linux-foundation.org Cc: "VMware, Inc." <pv-drivers@vmware.com> Cc: x86-ml <x86@kernel.org> Link: https://github.com/ClangBuiltLinux/linux/issues/734 Link: https://lkml.kernel.org/r/20191007192129.104336-1-samitolvanen@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-