- 19 12月, 2019 1 次提交
-
-
由 Jim Mattson 提交于
The host reports support for the synthetic feature X86_FEATURE_SSBD when any of the three following hardware features are set: CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] CPUID.80000008H:EBX.AMD_SSBD[bit 24] CPUID.80000008H:EBX.VIRT_SSBD[bit 25] Either of the first two hardware features implies the existence of the IA32_SPEC_CTRL MSR, but CPUID.80000008H:EBX.VIRT_SSBD[bit 25] does not. Therefore, CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] should only be set in the guest if CPUID.(EAX=7,ECX=0):EDX.SSBD[bit 31] or CPUID.80000008H:EBX.AMD_SSBD[bit 24] is set on the host. Fixes: 0c54914d ("KVM: x86: use Intel speculation bugs and features as derived in generic x86 code") Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NJacob Xu <jacobhxu@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Reported-by: NEric Biggers <ebiggers@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 11月, 2019 5 次提交
-
-
由 Paolo Bonzini 提交于
If X86_FEATURE_RTM is disabled, the guest should not be able to access MSR_IA32_TSX_CTRL. We can therefore use it in KVM to force all transactions from the guest to abort. Tested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The current guest mitigation of TAA is both too heavy and not really sufficient. It is too heavy because it will cause some affected CPUs (those that have MDS_NO but lack TAA_NO) to fall back to VERW and get the corresponding slowdown. It is not really sufficient because it will cause the MDS_NO bit to disappear upon microcode update, so that VMs started before the microcode update will not be runnable anymore afterwards, even with tsx=on. Instead, if tsx=on on the host, we can emulate MSR_IA32_TSX_CTRL for the guest and let it run without the VERW mitigation. Even though MSR_IA32_TSX_CTRL is quite heavyweight, and we do not want to write it on every vmentry, we can use the shared MSR functionality because the host kernel need not protect itself from TSX-based side-channels. Tested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
Because KVM always emulates CPUID, the CPUID clear bit (bit 1) of MSR_IA32_TSX_CTRL must be emulated "manually" by the hypervisor when performing said emulation. Right now neither kvm-intel.ko nor kvm-amd.ko implement MSR_IA32_TSX_CTRL but this will change in the next patch. Reviewed-by: NJim Mattson <jmattson@google.com> Tested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
"Shared MSRs" are guest MSRs that are written to the host MSRs but keep their value until the next return to userspace. They support a mask, so that some bits keep the host value, but this mask is only used to skip an unnecessary MSR write and the value written to the MSR is always the guest MSR. Fix this and, while at it, do not update smsr->values[slot].curr if for whatever reason the wrmsr fails. This should only happen due to reserved bits, so the value written to smsr->values[slot].curr will not match when the user-return notifier and the host value will always be restored. However, it is untidy and in rare cases this can actually avoid spurious WRMSRs on return to userspace. Cc: stable@vger.kernel.org Reviewed-by: NJim Mattson <jmattson@google.com> Tested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
KVM does not implement MSR_IA32_TSX_CTRL, so it must not be presented to the guests. It is also confusing to have !ARCH_CAP_TSX_CTRL_MSR && !RTM && ARCH_CAP_TAA_NO: lack of MSR_IA32_TSX_CTRL suggests TSX was not hidden (it actually was), yet the value says that TSX is not vulnerable to microarchitectural data sampling. Fix both. Cc: stable@vger.kernel.org Tested-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 11月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
Acquire the per-VM slots_lock when zapping all shadow pages as part of toggling nx_huge_pages. The fast zap algorithm relies on exclusivity (via slots_lock) to identify obsolete vs. valid shadow pages, because it uses a single bit for its generation number. Holding slots_lock also obviates the need to acquire a read lock on the VM's srcu. Failing to take slots_lock when toggling nx_huge_pages allows multiple instances of kvm_mmu_zap_all_fast() to run concurrently, as the other user, KVM_SET_USER_MEMORY_REGION, does not take the global kvm_lock. (kvm_mmu_zap_all_fast() does take kvm->mmu_lock, but it can be temporarily dropped by kvm_zap_obsolete_pages(), so it is not enough to enforce exclusivity). Concurrent fast zap instances causes obsolete shadow pages to be incorrectly identified as valid due to the single bit generation number wrapping, which results in stale shadow pages being left in KVM's MMU and leads to all sorts of undesirable behavior. The bug is easily confirmed by running with CONFIG_PROVE_LOCKING and toggling nx_huge_pages via its module param. Note, until commit 4ae5acbc4936 ("KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()", 2019-11-13) the fast zap algorithm used an ulong-sized generation instead of relying on exclusivity for correctness, but all callers except the recently added set_nx_huge_pages() needed to hold slots_lock anyways. Therefore, this patch does not have to be backported to stable kernels. Given that toggling nx_huge_pages is by no means a fast path, force it to conform to the current approach instead of reintroducing the previous generation count. Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation", but NOT FOR STABLE) Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 11月, 2019 2 次提交
-
-
由 Xiaoyao Li 提交于
When applying commit 7a5ee6ed ("KVM: X86: Fix initialization of MSR lists"), it forgot to reset the three MSR lists number varialbes to 0 while removing the useless conditionals. Fixes: 7a5ee6ed (KVM: X86: Fix initialization of MSR lists) Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
If a huge page is recovered (and becomes no executable) while another thread is executing it, the resulting contention on mmu_lock can cause latency spikes. Disabling recovery for PREEMPT_RT kernels fixes this issue. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 11月, 2019 6 次提交
-
-
由 Sean Christopherson 提交于
Explicitly exempt ZONE_DEVICE pages from kvm_is_reserved_pfn() and instead manually handle ZONE_DEVICE on a case-by-case basis. For things like page refcounts, KVM needs to treat ZONE_DEVICE pages like normal pages, e.g. put pages grabbed via gup(). But for flows such as setting A/D bits or shifting refcounts for transparent huge pages, KVM needs to to avoid processing ZONE_DEVICE pages as the flows in question lack the underlying machinery for proper handling of ZONE_DEVICE pages. This fixes a hang reported by Adam Borowski[*] in dev_pagemap_cleanup() when running a KVM guest backed with /dev/dax memory, as KVM straight up doesn't put any references to ZONE_DEVICE pages acquired by gup(). Note, Dan Williams proposed an alternative solution of doing put_page() on ZONE_DEVICE pages immediately after gup() in order to simplify the auditing needed to ensure is_zone_device_page() is called if and only if the backing device is pinned (via gup()). But that approach would break kvm_vcpu_{un}map() as KVM requires the page to be pinned from map() 'til unmap() when accessing guest memory, unlike KVM's secondary MMU, which coordinates with mmu_notifier invalidations to avoid creating stale page references, i.e. doesn't rely on pages being pinned. [*] http://lkml.kernel.org/r/20190919115547.GA17963@angband.plReported-by: NAdam Borowski <kilobyte@angband.pl> Analyzed-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: stable@vger.kernel.org Fixes: 3565fce3 ("mm, x86: get_user_pages() for dax mappings") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Streamline the PID.PIR check and change its call sites to use the newly added helper. Suggested-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
When vCPU enters block phase, pi_pre_block() inserts vCPU to a per pCPU linked list of all vCPUs that are blocked on this pCPU. Afterwards, it changes PID.NV to POSTED_INTR_WAKEUP_VECTOR which its handler (wakeup_handler()) is responsible to kick (unblock) any vCPU on that linked list that now has pending posted interrupts. While vCPU is blocked (in kvm_vcpu_block()), it may be preempted which will cause vmx_vcpu_pi_put() to set PID.SN. If later the vCPU will be scheduled to run on a different pCPU, vmx_vcpu_pi_load() will clear PID.SN but will also *overwrite PID.NDST to this different pCPU*. Instead of keeping it with original pCPU which vCPU had entered block phase on. This results in an issue because when a posted interrupt is delivered, as the wakeup_handler() will be executed and fail to find blocked vCPU on its per pCPU linked list of all vCPUs that are blocked on this pCPU. Which is due to the vCPU being placed on a *different* per pCPU linked list i.e. the original pCPU in which it entered block phase. The regression is introduced by commit c112b5f5 ("KVM: x86: Recompute PID.ON when clearing PID.SN"). Therefore, partially revert it and reintroduce the condition in vmx_vcpu_pi_load() responsible for avoiding changing PID.NDST when loading a blocked vCPU. Fixes: c112b5f5 ("KVM: x86: Recompute PID.ON when clearing PID.SN") Tested-by: NNathan Ni <nathan.ni@oracle.com> Co-developed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Commit 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU") introduced vmx_dy_apicv_has_pending_interrupt() in order to determine if a vCPU have a pending posted interrupt. This routine is used by kvm_vcpu_on_spin() when searching for a a new runnable vCPU to schedule on pCPU instead of a vCPU doing busy loop. vmx_dy_apicv_has_pending_interrupt() determines if a vCPU has a pending posted interrupt solely based on PID.ON. However, when a vCPU is preempted, vmx_vcpu_pi_put() sets PID.SN which cause raised posted interrupts to only set bit in PID.PIR without setting PID.ON (and without sending notification vector), as depicted in VT-d manual section 5.2.3 "Interrupt-Posting Hardware Operation". Therefore, checking PID.ON is insufficient to determine if a vCPU has pending posted interrupts and instead we should also check if there is some bit set on PID.PIR if PID.SN=1. Fixes: 17e433b5 ("KVM: Fix leak vCPU's VMCS value into other pCPU") Reviewed-by: NJagannathan Raman <jag.raman@oracle.com> Co-developed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Liran Alon 提交于
The Outstanding Notification (ON) bit is part of the Posted Interrupt Descriptor (PID) as opposed to the Posted Interrupts Register (PIR). The latter is a bitmap for pending vectors. Reviewed-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
The three MSR lists(msrs_to_save[], emulated_msrs[] and msr_based_features[]) are global arrays of kvm.ko, which are adjusted (copy supported MSRs forward to override the unsupported MSRs) when insmod kvm-{intel,amd}.ko, but it doesn't reset these three arrays to their initial value when rmmod kvm-{intel,amd}.ko. Thus, at the next installation, kvm-{intel,amd}.ko will do operations on the modified arrays with some MSRs lost and some MSRs duplicated. So define three constant arrays to hold the initial MSR lists and initialize msrs_to_save[], emulated_msrs[] and msr_based_features[] based on the constant arrays. Cc: stable@vger.kernel.org Reviewed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> [Remove now useless conditionals. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 07 11月, 2019 2 次提交
-
-
由 Josh Poimboeuf 提交于
For new IBRS_ALL CPUs, the Enhanced IBRS check at the beginning of cpu_bugs_smt_update() causes the function to return early, unintentionally skipping the MDS and TAA logic. This is not a problem for MDS, because there appears to be no overlap between IBRS_ALL and MDS-affected CPUs. So the MDS mitigation would be disabled and nothing would need to be done in this function anyway. But for TAA, the TAA_MSG_SMT string will never get printed on Cascade Lake and newer. The check is superfluous anyway: when 'spectre_v2_enabled' is SPECTRE_V2_IBRS_ENHANCED, 'spectre_v2_user' is always SPECTRE_V2_USER_NONE, and so the 'spectre_v2_user' switch statement handles it appropriately by doing nothing. So just remove the check. Fixes: 1b42f017 ("x86/speculation/taa: Add mitigation for TSX Async Abort") Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTyler Hicks <tyhicks@canonical.com> Reviewed-by: NBorislav Petkov <bp@suse.de>
-
由 Catalin Marinas 提交于
Following commit 73e86cb0 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()"), the PTE_RDONLY bit is no longer managed by set_pte_at() but built into the PAGE_* attribute definitions. Consequently, pte_same() must include this bit when checking two PTEs for equality. Remove the arm64-specific pte_same() function, practically reverting commit 747a70e6 ("arm64: Fix copy-on-write referencing in HugeTLB") Fixes: 73e86cb0 ("arm64: Move PTE_RDONLY bit handling out of set_pte_at()") Cc: <stable@vger.kernel.org> # 4.14.x- Cc: Will Deacon <will@kernel.org> Cc: Steve Capper <steve.capper@arm.com> Reported-by: NJohn Stultz <john.stultz@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
- 06 11月, 2019 4 次提交
-
-
由 Amelie Delaunay 提交于
Pins used for joystick are all configured as input. "push-pull" is not a valid setting for an input pin. Fixes: a502b343 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com> Signed-off-by: NAmelie Delaunay <amelie.delaunay@st.com> Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com>
-
由 Amelie Delaunay 提交于
"push-pull" configuration is now fully handled by the gpiolib and the STMFX pinctrl driver. There is no longer need to declare a pinctrl group to only configure "push-pull" setting for the line. It is done directly by the gpiolib. Fixes: a502b343 ("pinctrl: stmfx: update pinconf settings") Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com> Signed-off-by: NAmelie Delaunay <amelie.delaunay@st.com> Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com>
-
由 Christophe Roullier 提交于
Split the 10Kbytes CAN message RAM to be able to use simultaneously FDCAN1 and FDCAN2 instances. First 5Kbytes are allocated to FDCAN1 and last 5Kbytes are used for FDCAN2. To do so, set the offset to 0x1400 in mram-cfg for FDCAN2. Fixes: d44d6e02 ("ARM: dts: stm32: change CAN RAM mapping on stm32mp157c") Signed-off-by: NChristophe Roullier <christophe.roullier@st.com> Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com>
-
由 Patrice Chotard 提交于
Relax qspi pins slew-rate to minimize peak currents. Fixes: 84403005 ("ARM: dts: stm32: add flash nor support on stm32mp157c eval board") Signed-off-by: NPatrice Chotard <patrice.chotard@st.com> Signed-off-by: NAlexandre Torgue <alexandre.torgue@st.com>
-
- 05 11月, 2019 5 次提交
-
-
由 Michael Zhivich 提交于
The introduction of clocksource_tsc_early broke the functionality of "tsc=reliable" and "tsc=nowatchdog" command line parameters, since clocksource_tsc_early is unconditionally registered with CLOCK_SOURCE_MUST_VERIFY and thus put on the watchdog list. This can cause the TSC to be declared unstable during boot: clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc-early' as unstable because the skew is too large: clocksource: 'refined-jiffies' wd_now: fffb7018 wd_last: fffb6e9d mask: ffffffff clocksource: 'tsc-early' cs_now: 68a6a7070f6a0 cs_last: 68a69ab6f74d6 mask: ffffffffffffffff tsc: Marking TSC unstable due to clocksource watchdog The corresponding elapsed times are cs_nsec=1224152026 and wd_nsec=378942392, so the watchdog differs from TSC by 0.84 seconds. This happens when HPET is not available and jiffies are used as the TSC watchdog instead and the jiffies update is not happening due to lost timer interrupts in periodic mode, which can happen e.g. with expensive debug mechanisms enabled or under massive overload conditions in virtualized environments. Before the introduction of the early TSC clocksource the command line parameters "tsc=reliable" and "tsc=nowatchdog" could be used to work around this issue. Restore the behaviour by disabling the watchdog if requested on the kernel command line. [ tglx: Clarify changelog ] Fixes: aa83c457 ("x86/tsc: Introduce early tsc clocksource") Signed-off-by: NMichael Zhivich <mzhivich@akamai.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20191024175945.14338-1-mzhivich@akamai.com
-
由 Thomas Gleixner 提交于
Cyrill reported the following crash: BUG: unable to handle page fault for address: 0000000000001ff0 #PF: supervisor read access in kernel mode RIP: 0010:get_stack_info+0xb3/0x148 It turns out that if the stack tracer is invoked before the exception stack mappings are initialized in_exception_stack() can erroneously classify an invalid address as an address inside of an exception stack: begin = this_cpu_read(cea_exception_stacks); <- 0 end = begin + sizeof(exception stacks); i.e. any address between 0 and end will be considered as exception stack address and the subsequent code will then try to derefence the resulting stack frame at a non mapped address. end = begin + (unsigned long)ep->size; ==> end = 0x2000 regs = (struct pt_regs *)end - 1; ==> regs = 0x2000 - sizeof(struct pt_regs *) = 0x1ff0 info->next_sp = (unsigned long *)regs->sp; ==> Crashes due to accessing 0x1ff0 Prevent this by checking the validity of the cea_exception_stack base address and bailing out if it is zero. Fixes: afcd21da ("x86/dumpstack/64: Use cpu_entry_area instead of orig_ist") Reported-by: NCyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NCyrill Gorcunov <gorcunov@gmail.com> Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1910231950590.1852@nanos.tec.linutronix.de
-
由 Jan Beulich 提交于
The removal of the LDR initialization in the bigsmp_32 APIC code unearthed a problem in setup_local_APIC(). The code checks unconditionally for a mismatch of the logical APIC id by comparing the early APIC id which was initialized in get_smp_config() with the actual LDR value in the APIC. Due to the removal of the bogus LDR initialization the check now can trigger on bigsmp_32 APIC systems emitting a warning for every booting CPU. This is of course a false positive because the APIC is not using logical destination mode. Restrict the check and the possibly resulting fixup to systems which are actually using the APIC in logical destination mode. [ tglx: Massaged changelog and added Cc stable ] Fixes: bae3a8d3 ("x86/apic: Do not initialize LDR and DFR for bigsmp") Signed-off-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/666d8f91-b5a8-1afd-7add-821e72a35f03@suse.com
-
由 Huacai Chen 提交于
The update of the VDSO data is depending on __arch_use_vsyscall() returning True. This is a leftover from the attempt to map the features of various architectures 1:1 into generic code. The usage of __arch_use_vsyscall() in the actual vsyscall implementations got dropped and replaced by the requirement for the architecture code to return U64_MAX if the global clocksource is not usable in the VDSO. But the __arch_use_vsyscall() check in the update code stayed which causes the VDSO data to be stale or invalid when an architecture actually implements that function and returns False when the current clocksource is not usable in the VDSO. As a consequence the VDSO implementations of clock_getres(), time(), clock_gettime(CLOCK_.*_COARSE) operate on invalid data and return bogus information. Remove the __arch_use_vsyscall() check from the VDSO update function and update the VDSO data unconditionally. [ tglx: Massaged changelog and removed the now useless implementations in asm-generic/ARM64/MIPS ] Fixes: 44f57d78 ("timekeeping: Provide a generic update_vsyscall() implementation") Signed-off-by: NHuacai Chen <chenhc@lemote.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1571887709-11447-1-git-send-email-chenhc@lemote.com
-
由 Junaid Shahid 提交于
The page table pages corresponding to broken down large pages are zapped in FIFO order, so that the large page can potentially be recovered, if it is not longer being used for execution. This removes the performance penalty for walking deeper EPT page tables. By default, one large page will last about one hour once the guest reaches a steady state. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 04 11月, 2019 5 次提交
-
-
由 Paolo Bonzini 提交于
With some Intel processors, putting the same virtual address in the TLB as both a 4 KiB and 2 MiB page can confuse the instruction fetch unit and cause the processor to issue a machine check resulting in a CPU lockup. Unfortunately when EPT page tables use huge pages, it is possible for a malicious guest to cause this situation. Add a knob to mark huge pages as non-executable. When the nx_huge_pages parameter is enabled (and we are using EPT), all huge pages are marked as NX. If the guest attempts to execute in one of those pages, the page is broken down into 4K pages, which are then marked executable. This is not an issue for shadow paging (except nested EPT), because then the host is in control of TLB flushes and the problematic situation cannot happen. With nested EPT, again the nested guest can cause problems shadow and direct EPT is treated in the same way. [ tglx: Fixup default to auto and massage wording a bit ] Originally-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Pawan Gupta 提交于
Add the new cpu family ATOM_TREMONT_D to the cpu vunerability whitelist. ATOM_TREMONT_D is not affected by X86_BUG_ITLB_MULTIHIT. ATOM_TREMONT_D might have mitigations against other issues as well, but only the ITLB multihit mitigation is confirmed at this point. Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Vineela Tummalapalli 提交于
Some processors may incur a machine check error possibly resulting in an unrecoverable CPU lockup when an instruction fetch encounters a TLB multi-hit in the instruction TLB. This can occur when the page size is changed along with either the physical address or cache type. The relevant erratum can be found here: https://bugzilla.kernel.org/show_bug.cgi?id=205195 There are other processors affected for which the erratum does not fully disclose the impact. This issue affects both bare-metal x86 page tables and EPT. It can be mitigated by either eliminating the use of large pages or by using careful TLB invalidations when changing the page size in the page tables. Just like Spectre, Meltdown, L1TF and MDS, a new bit has been allocated in MSR_IA32_ARCH_CAPABILITIES (PSCHANGE_MC_NO) and will be set on CPUs which are mitigated against this issue. Signed-off-by: NVineela Tummalapalli <vineela.tummalapalli@intel.com> Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Lucas Stach 提交于
The GPIO handle is referencing the wrong GPIO, so the voltage did not actually change as intended. The pinmux is already correct, so just correct the GPIO number. Fixes: 4a13b3be (arm64: dts: imx: add Zii Ultra board support) Signed-off-by: NLucas Stach <l.stach@pengutronix.de> Signed-off-by: NShawn Guo <shawnguo@kernel.org>
-
由 Xiaochen Shen 提交于
When a mon group is being deleted, rdtgrp->flags is set to RDT_DELETED in rdtgroup_rmdir_mon() firstly. The structure of rdtgrp will be freed until rdtgrp->waitcount is dropped to 0 in rdtgroup_kn_unlock() later. During the window of deleting a mon group, if an application calls rdtgroup_mondata_show() to read mondata under this mon group, 'rdtgrp' returned from rdtgroup_kn_lock_live() is a NULL pointer when rdtgrp->flags is RDT_DELETED. And then 'rdtgrp' is passed in this path: rdtgroup_mondata_show() --> mon_event_read() --> mon_event_count(). Thus it results in NULL pointer dereference in mon_event_count(). Check 'rdtgrp' in rdtgroup_mondata_show(), and return -ENOENT immediately when reading mondata during the window of deleting a mon group. Fixes: d89b7379 ("x86/intel_rdt/cqm: Add mon_data") Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NFenghua Yu <fenghua.yu@intel.com> Reviewed-by: NTony Luck <tony.luck@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: pei.p.jia@intel.com Cc: Reinette Chatre <reinette.chatre@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/1572326702-27577-1-git-send-email-xiaochen.shen@intel.com
-
- 02 11月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
We have seen many crashes on powerpc hosts while loading bpf programs. The problem here is that bpf_int_jit_compile() does a first pass to compute the program length. Then it allocates memory to store the generated program and calls bpf_jit_build_body() a second time (and a third time later) What I have observed is that the second bpf_jit_build_body() could end up using few more words than expected. If bpf_jit_binary_alloc() put the space for the program at the end of the allocated page, we then write on a non mapped memory. It appears that bpf_jit_emit_tail_call() calls bpf_jit_emit_common_epilogue() while ctx->seen might not be stable. Only after the second pass we can be sure ctx->seen wont be changed. Trying to avoid a second pass seems quite complex and probably not worth it. Fixes: ce076141 ("powerpc/bpf: Implement support for tail calls") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com> Cc: Sandipan Das <sandipan@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Song Liu <songliubraving@fb.com> Cc: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191101033444.143741-1-edumazet@google.com
-
- 01 11月, 2019 6 次提交
-
-
由 Florian Fainelli 提交于
The Broadcom Brahma-B53 core is susceptible to the issue described by ARM64_ERRATUM_843419 so this commit enables the workaround to be applied when executing on that core. Since there are now multiple entries to match, we must convert the existing ARM64_ERRATUM_843419 into an erratum list and use cpucap_multi_entry_cap_matches to match our entries. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Florian Fainelli 提交于
Add the Brahma-B53 CPU (all versions) to the whitelists of CPUs for the SSB and spectre v2 mitigations. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Doug Berger 提交于
The Broadcom Brahma-B53 core is susceptible to the issue described by ARM64_ERRATUM_845719 so this commit enables the workaround to be applied when executing on that core. Since there are now multiple entries to match, we must convert the existing ARM64_ERRATUM_845719 into an erratum list. Signed-off-by: NDoug Berger <opendmb@gmail.com> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Heiko Carstens 提交于
The idle time reported in /proc/stat sometimes incorrectly contains huge values on s390. This is caused by a bug in arch_cpu_idle_time(). The kernel tries to figure out when a different cpu entered idle by accessing its per-cpu data structure. There is an ordering problem: if the remote cpu has an idle_enter value which is not zero, and an idle_exit value which is zero, it is assumed it is idle since "now". The "now" timestamp however is taken before the idle_enter value is read. Which in turn means that "now" can be smaller than idle_enter of the remote cpu. Unconditionally subtracting idle_enter from "now" can thus lead to a negative value (aka large unsigned value). Fix this by moving the get_tod_clock() invocation out of the loop. While at it also make the code a bit more readable. A similar bug also exists for show_idle_time(). Fix this is as well. Cc: <stable@vger.kernel.org> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Ilya Leoshkevich 提交于
unwind_for_each_frame stops after the first frame if regs->gprs[15] <= sp. The reason is that in case regs are specified, the first frame should be regs->psw.addr and the second frame should be sp->gprs[8]. However, currently the second frame is regs->gprs[15], which confuses outside_of_stack(). Fix by introducing a flag to distinguish this special case from unwinding the interrupt handler, for which the current behavior is appropriate. Fixes: 78c98f90 ("s390/unwind: introduce stack unwind API") Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Cc: stable@vger.kernel.org # v5.2+ Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Yihui ZENG 提交于
The problem is that we were putting the NUL terminator too far: buf[sizeof(buf) - 1] = '\0'; If the user input isn't NUL terminated and they haven't initialized the whole buffer then it leads to an info leak. The NUL terminator should be: buf[len - 1] = '\0'; Signed-off-by: NYihui Zeng <yzeng56@asu.edu> Cc: stable@vger.kernel.org Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> [heiko.carstens@de.ibm.com: keep semantics of how *lenp and *ppos are handled] Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 31 10月, 2019 2 次提交
-
-
由 Bjorn Andersson 提交于
The Kryo cores share errata 1009 with Falkor, so add their model definitions and enable it for them as well. Signed-off-by: NBjorn Andersson <bjorn.andersson@linaro.org> [will: Update entry in silicon-errata.rst] Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Paolo Bonzini 提交于
VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX already ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. Removing all the conditionals simplifies the code. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch also makes the code for the two vendors more similar while fixing NPT=0, CR0.WP=1 and CR4.SMEP=1 on AMD processors. Cc: stable@vger.kernel.org Cc: Joerg Roedel <jroedel@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-