- 04 7月, 2020 2 次提交
-
-
由 Sean Christopherson 提交于
Mark CR4.TSD as being possibly owned by the guest as that is indeed the case on VMX. Without TSD being tagged as possibly owned by the guest, a targeted read of CR4 to get TSD could observe a stale value. This bug is benign in the current code base as the sole consumer of TSD is the emulator (for RDTSC) and the emulator always "reads" the entirety of CR4 when grabbing bits. Add a build-time assertion in to ensure VMX doesn't hand over more CR4 bits without also updating x86. Fixes: 52ce3c21 ("x86,kvm,vmx: Don't trap writes to CR4.TSD") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703040422.31536-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Inject a #GP on MOV CR4 if CR4.LA57 is toggled in 64-bit mode, which is illegal per Intel's SDM: CR4.LA57 57-bit linear addresses (bit 12 of CR4) ... blah blah blah ... This bit cannot be modified in IA-32e mode. Note, the pseudocode for MOV CR doesn't call out the fault condition, which is likely why the check was missed during initial development. This is arguably an SDM bug and will hopefully be fixed in future release of the SDM. Fixes: fd8cb433 ("KVM: MMU: Expose the LA57 feature to VM.") Cc: stable@vger.kernel.org Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200703021714.5549-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 6月, 2020 1 次提交
-
-
由 Paolo Bonzini 提交于
Bit 8 would be the "global" bit, which does not quite make sense for non-leaf page table entries. Intel ignores it; AMD ignores it in PDEs and PDPEs, but reserves it in PML4Es. Probably, earlier versions of the AMD manual documented it as reserved in PDPEs as well, and that behavior made it into KVM as well as kvm-unit-tests; fix it. Cc: stable@vger.kernel.org Reported-by: NNadav Amit <namit@vmware.com> Fixes: a0c0feb5 ("KVM: x86: reserve bit 8 of non-leaf PDPEs and PML4Es in 64-bit mode on AMD", 2014-09-03) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 6月, 2020 1 次提交
-
-
由 Wanpeng Li 提交于
Syzbot reported that: CPU: 1 PID: 6780 Comm: syz-executor153 Not tainted 5.7.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__apic_accept_irq+0x46/0xb80 Call Trace: kvm_arch_async_page_present+0x7de/0x9e0 kvm_check_async_pf_completion+0x18d/0x400 kvm_arch_vcpu_ioctl_run+0x18bf/0x69f0 kvm_vcpu_ioctl+0x46a/0xe20 ksys_ioctl+0x11a/0x180 __x64_sys_ioctl+0x6f/0xb0 do_syscall_64+0xf6/0x7d0 entry_SYSCALL_64_after_hwframe+0x49/0xb3 The testcase enables APF mechanism in MSR_KVM_ASYNC_PF_EN with ASYNC_PF_INT enabled w/o setting MSR_KVM_ASYNC_PF_INT before, what's worse, interrupt based APF 'page ready' event delivery depends on in kernel lapic, however, we didn't bail out when lapic is not in kernel during guest setting MSR_KVM_ASYNC_PF_EN which causes the null-ptr-deref in host later. This patch fixes it. Reported-by: syzbot+1bf777dfdde86d64b89b@syzkaller.appspotmail.com Fixes: 2635b5c4 (KVM: x86: interrupt based APF 'page ready' event delivery) Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1593426391-8231-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 6月, 2020 9 次提交
-
-
由 Marc Zyngier 提交于
When making a vPE non-resident because it has hit a blocking WFI, the doorbell can fire at any time after the write to the RD. Crucially, it can fire right between the write to GICR_VPENDBASER and the write to the pending_last field in the its_vpe structure. This means that we would overwrite pending_last with stale data, and potentially not wakeup until some unrelated event (such as a timer interrupt) puts the vPE back on the CPU. GICv4 isn't affected by this as we actively mask the doorbell on entering the guest, while GICv4.1 automatically manages doorbell delivery without any hypervisor-driven masking. Use the vpe_lock to synchronize such update, which solves the problem altogether. Fixes: ae699ad3 ("irqchip/gic-v4.1: Move doorbell management to the GICv4 abstraction layer") Reported-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Sean Christopherson 提交于
Remove vcpu_vmx.host_pkru, which got left behind when PKRU support was moved to common x86 code. No functional change intended. Fixes: 37486135 ("KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200617034123.25647-1-sean.j.christopherson@intel.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Marcelo Tosatti 提交于
The Linux TSC calibration procedure is subject to small variations (its common to see +-1 kHz difference between reboots on a given CPU, for example). So migrating a guest between two hosts with identical processor can fail, in case of a small variation in calibrated TSC between them. Without TSC scaling, the current kernel interface will either return an error (if user_tsc_khz <= tsc_khz) or enable TSC catchup mode. This change enables the following TSC tolerance check to accept KVM_SET_TSC_KHZ within tsc_tolerance_ppm (which is 250ppm by default). /* * Compute the variation in TSC rate which is acceptable * within the range of tolerance and decide if the * rate being applied is within that bounds of the hardware * rate. If so, no scaling or compensation need be done. */ thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) { pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); use_scaling = 1; } NTP daemon in the guest can correct this difference (NTP can correct upto 500ppm). Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com> Message-Id: <20200616114741.GA298183@fuller.cnet> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Xiaoyao Li 提交于
Only MSR address range 0x800 through 0x8ff is architecturally reserved and dedicated for accessing APIC registers in x2APIC mode. Fixes: 0105d1a5 ("KVM: x2apic interface to lapic") Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Message-Id: <20200616073307.16440-1-xiaoyao.li@intel.com> Cc: stable@vger.kernel.org Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove support for context switching between the guest's and host's desired UMWAIT_CONTROL. Propagating the guest's value to hardware isn't required for correct functionality, e.g. KVM intercepts reads and writes to the MSR, and the latency effects of the settings controlled by the MSR are not architecturally visible. As a general rule, KVM should not allow the guest to control power management settings unless explicitly enabled by userspace, e.g. see KVM_CAP_X86_DISABLE_EXITS. E.g. Intel's SDM explicitly states that C0.2 can improve the performance of SMT siblings. A devious guest could disable C0.2 so as to improve the performance of their workloads at the detriment to workloads running in the host or on other VMs. Wholesale removal of UMWAIT_CONTROL context switching also fixes a race condition where updates from the host may cause KVM to enter the guest with the incorrect value. Because updates are are propagated to all CPUs via IPI (SMP function callback), the value in hardware may be stale with respect to the cached value and KVM could enter the guest with the wrong value in hardware. As above, the guest can't observe the bad value, but it's a weird and confusing wart in the implementation. Removal also fixes the unnecessary usage of VMX's atomic load/store MSR lists. Using the lists is only necessary for MSRs that are required for correct functionality immediately upon VM-Enter/VM-Exit, e.g. EFER on old hardware, or for MSRs that need to-the-uop precision, e.g. perf related MSRs. For UMWAIT_CONTROL, the effects are only visible in the kernel via TPAUSE/delay(), and KVM doesn't do any form of delay in vcpu_vmx_run(). Using the atomic lists is undesirable as they are more expensive than direct RDMSR/WRMSR. Furthermore, even if giving the guest control of the MSR is legitimate, e.g. in pass-through scenarios, it's not clear that the benefits would outweigh the overhead. E.g. saving and restoring an MSR across a VMX roundtrip costs ~250 cycles, and if the guest diverged from the host that cost would be paid on every run of the guest. In other words, if there is a legitimate use case then it should be enabled by a new per-VM capability. Note, KVM still needs to emulate MSR_IA32_UMWAIT_CONTROL so that it can correctly expose other WAITPKG features to the guest, e.g. TPAUSE, UMWAIT and UMONITOR. Fixes: 6e3ba4ab ("KVM: vmx: Emulate MSR IA32_UMWAIT_CONTROL") Cc: stable@vger.kernel.org Cc: Jingqi Liu <jingqi.liu@intel.com> Cc: Tao Xu <tao3.xu@intel.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200623005135.10414-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Explicitly pass the L2 GPA to kvm_arch_write_log_dirty(), which for all intents and purposes is vmx_write_pml_buffer(), instead of having the latter pull the GPA from vmcs.GUEST_PHYSICAL_ADDRESS. If the dirty bit update is the result of KVM emulation (rare for L2), then the GPA in the VMCS may be stale and/or hold a completely unrelated GPA. Fixes: c5f983f6 ("nVMX: Implement emulated Page Modification Logging") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200622215832.22090-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
translate_gpa() returns a GPA, assigning it to 'real_gfn' seems obviously wrong. There is no real issue because both 'gpa_t' and 'gfn_t' are u64 and we don't use the value in 'real_gfn' as a GFN, we do real_gfn = gpa_to_gfn(real_gfn); instead. 'If you see a "buffalo" sign on an elephant's cage, do not trust your eyes', but let's fix it for good. No functional change intended. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200622151435.752560-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The following race can cause lost map update events: cpu1 cpu2 apic_map_dirty = true ------------------------------------------------------------ kvm_recalculate_apic_map: pass check mutex_lock(&kvm->arch.apic_map_lock); if (!kvm->arch.apic_map_dirty) and in process of updating map ------------------------------------------------------------- other calls to apic_map_dirty = true might be too late for affected cpu ------------------------------------------------------------- apic_map_dirty = false ------------------------------------------------------------- kvm_recalculate_apic_map: bail out on if (!kvm->arch.apic_map_dirty) To fix it, record the beginning of an update of the APIC map in apic_map_dirty. If another APIC map change switches apic_map_dirty back to DIRTY during the update, kvm_recalculate_apic_map should not make it CLEAN, and the other caller will go through the slow path. Reported-by: NIgor Mammedov <imammedo@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Igor Mammedov 提交于
Guest fails to online hotplugged CPU with error smpboot: do_boot_cpu failed(-1) to wakeup CPU#4 It's caused by the fact that kvm_apic_set_state(), which used to call recalculate_apic_map() unconditionally and pulled hotplugged CPU into apic map, is updating map conditionally on state changes. In this case the APIC map is not considered dirty and the is not updated. Fix the issue by forcing unconditional update from kvm_apic_set_state(), like it used to be. Fixes: 4abaffce ("KVM: LAPIC: Recalculate apic map in batch") Cc: stable@vger.kernel.org Signed-off-by: NIgor Mammedov <imammedo@redhat.com> Message-Id: <20200622160830.426022-1-imammedo@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 6月, 2020 3 次提交
-
-
由 Andrew Jones 提交于
Ensure we're actually accounting run_delay before we claim that we'll expose it to the guest. If we're not, then we just pretend like steal time isn't supported in order to avoid any confusion. Signed-off-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200622142710.18677-1-drjones@redhat.com
-
由 Steven Price 提交于
If SVE is enabled then 'ret' can be assigned the return value of kvm_vcpu_enable_sve() which may be 0 causing future "goto out" sites to erroneously return 0 on failure rather than -EINVAL as expected. Remove the initialisation of 'ret' and make setting the return value explicit to avoid this situation in the future. Fixes: 9a3cdf26 ("KVM: arm64/sve: Allow userspace to enable SVE for vcpus") Cc: stable@vger.kernel.org Reported-by: NJames Morse <james.morse@arm.com> Signed-off-by: NSteven Price <steven.price@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20200617105456.28245-1-steven.price@arm.com
-
由 Alexandru Elisei 提交于
The "inline" keyword is a hint for the compiler to inline a function. The functions system_uses_irq_prio_masking() and gic_write_pmr() are used by the code running at EL2 on a non-VHE system, so mark them as __always_inline to make sure they'll always be part of the .hyp.text section. This fixes the following splat when trying to run a VM: [ 47.625273] Kernel panic - not syncing: HYP panic: [ 47.625273] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006 [ 47.625273] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000 [ 47.625273] VCPU:0000000000000000 [ 47.647261] CPU: 1 PID: 217 Comm: kvm-vcpu-0 Not tainted 5.8.0-rc1-ARCH+ #61 [ 47.654508] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT) [ 47.661139] Call trace: [ 47.663659] dump_backtrace+0x0/0x1cc [ 47.667413] show_stack+0x18/0x24 [ 47.670822] dump_stack+0xb8/0x108 [ 47.674312] panic+0x124/0x2f4 [ 47.677446] panic+0x0/0x2f4 [ 47.680407] SMP: stopping secondary CPUs [ 47.684439] Kernel Offset: disabled [ 47.688018] CPU features: 0x240402,20002008 [ 47.692318] Memory Limit: none [ 47.695465] ---[ end Kernel panic - not syncing: HYP panic: [ 47.695465] PS:a00003c9 PC:0000ca0b42049fc4 ESR:86000006 [ 47.695465] FAR:0000ca0b42049fc4 HPFAR:0000000010001000 PAR:0000000000000000 [ 47.695465] VCPU:0000000000000000 ]--- The instruction abort was caused by the code running at EL2 trying to fetch an instruction which wasn't mapped in the EL2 translation tables. Using objdump showed the two functions as separate symbols in the .text section. Fixes: 85738e05 ("arm64: kvm: Unmask PMR before entering guest") Cc: stable@vger.kernel.org Signed-off-by: NAlexandru Elisei <alexandru.elisei@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Acked-by: NJames Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20200618171254.1596055-1-alexandru.elisei@arm.com
-
- 19 6月, 2020 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Guest crashes are observed on a Cascade Lake system when 'perf top' is launched on the host, e.g. BUG: unable to handle kernel paging request at fffffe0000073038 PGD 7ffa7067 P4D 7ffa7067 PUD 7ffa6067 PMD 7ffa5067 PTE ffffffffff120 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1 Comm: systemd Not tainted 4.18.0+ #380 ... Call Trace: serial8250_console_write+0xfe/0x1f0 call_console_drivers.constprop.0+0x9d/0x120 console_unlock+0x1ea/0x460 Call traces are different but the crash is imminent. The problem was blindly bisected to the commit 041bc42c ("KVM: VMX: Micro-optimize vmexit time when not exposing PMU"). It was also confirmed that the issue goes away if PMU is exposed to the guest. With some instrumentation of the guest we can see what is being switched (when we do atomic_switch_perf_msrs()): vmx_vcpu_run: switching 2 msrs vmx_vcpu_run: switching MSR38f guest: 70000000d host: 70000000f vmx_vcpu_run: switching MSR3f1 guest: 0 host: 2 The current guess is that PEBS (MSR_IA32_PEBS_ENABLE, 0x3f1) is to blame. Regardless of whether PMU is exposed to the guest or not, PEBS needs to be disabled upon switch. This reverts commit 041bc42c. Reported-by: NMaxime Coquelin <maxime.coquelin@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20200619094046.654019-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 6月, 2020 1 次提交
-
-
由 Christian Borntraeger 提交于
The current number of KVM_IRQCHIP_NUM_PINS results in an order 3 allocation (32kb) for each guest start/restart. This can result in OOM killer activity even with free swap when the memory is fragmented enough: kernel: qemu-system-s39 invoked oom-killer: gfp_mask=0x440dc0(GFP_KERNEL_ACCOUNT|__GFP_COMP|__GFP_ZERO), order=3, oom_score_adj=0 kernel: CPU: 1 PID: 357274 Comm: qemu-system-s39 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu kernel: Hardware name: IBM 8562 T02 Z06 (LPAR) kernel: Call Trace: kernel: ([<00000001f848fe2a>] show_stack+0x7a/0xc0) kernel: [<00000001f8d3437a>] dump_stack+0x8a/0xc0 kernel: [<00000001f8687032>] dump_header+0x62/0x258 kernel: [<00000001f8686122>] oom_kill_process+0x172/0x180 kernel: [<00000001f8686abe>] out_of_memory+0xee/0x580 kernel: [<00000001f86e66b8>] __alloc_pages_slowpath+0xd18/0xe90 kernel: [<00000001f86e6ad4>] __alloc_pages_nodemask+0x2a4/0x320 kernel: [<00000001f86b1ab4>] kmalloc_order+0x34/0xb0 kernel: [<00000001f86b1b62>] kmalloc_order_trace+0x32/0xe0 kernel: [<00000001f84bb806>] kvm_set_irq_routing+0xa6/0x2e0 kernel: [<00000001f84c99a4>] kvm_arch_vm_ioctl+0x544/0x9e0 kernel: [<00000001f84b8936>] kvm_vm_ioctl+0x396/0x760 kernel: [<00000001f875df66>] do_vfs_ioctl+0x376/0x690 kernel: [<00000001f875e304>] ksys_ioctl+0x84/0xb0 kernel: [<00000001f875e39a>] __s390x_sys_ioctl+0x2a/0x40 kernel: [<00000001f8d55424>] system_call+0xd8/0x2c8 As far as I can tell s390x does not use the iopins as we bail our for anything other than KVM_IRQ_ROUTING_S390_ADAPTER and the chip/pin is only used for KVM_IRQ_ROUTING_IRQCHIP. So let us use a small number to reduce the memory footprint. Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20200617083620.5409-1-borntraeger@de.ibm.com
-
- 16 6月, 2020 1 次提交
-
-
由 Sean Christopherson 提交于
Add is_intr_type() and is_intr_type_n() to consolidate the boilerplate code for querying a specific type of interrupt given an encoded value from VMCS.VM_{ENTER,EXIT}_INTR_INFO, with and without an associated vector respectively. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20200609014518.26756-1-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 6月, 2020 2 次提交
-
-
由 Qian Cai 提交于
For some reasons, running a simple qemu-kvm command with KCSAN will reset AMD hosts. It turns out svm_vcpu_run() could not be instrumented. Disable it for now. # /usr/libexec/qemu-kvm -name ubuntu-18.04-server-cloudimg -cpu host -smp 2 -m 2G -hda ubuntu-18.04-server-cloudimg.qcow2 === console output === Kernel 5.6.0-next-20200408+ on an x86_64 hp-dl385g10-05 login: <...host reset...> HPE ProLiant System BIOS A40 v1.20 (03/09/2018) (C) Copyright 1982-2018 Hewlett Packard Enterprise Development LP Early system initialization, please wait... Signed-off-by: NQian Cai <cai@lca.pw> Message-Id: <20200415153709.1559-1-cai@lca.pw> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Huacai Chen 提交于
During the KVM merging progress, a CONFIG_CPU_LOONGSON64 guard in commit 7f2a83f1 ("KVM: MIPS: Add CPUCFG emulation for Loongson-3") is missing by accident. So add it to avoid building error. Fixes: 7f2a83f1 ("KVM: MIPS: Add CPUCFG emulation for Loongson-3") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NHuacai Chen <chenhc@lemote.com> Message-Id: <1592204215-28704-1-git-send-email-chenhc@lemote.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 6月, 2020 1 次提交
-
-
由 Masahiro Yamada 提交于
Since commit 84af7a61 ("checkpatch: kconfig: prefer 'help' over '---help---'"), the number of '---help---' has been gradually decreasing, but there are still more than 2400 instances. This commit finishes the conversion. While I touched the lines, I also fixed the indentation. There are a variety of indentation styles found. a) 4 spaces + '---help---' b) 7 spaces + '---help---' c) 8 spaces + '---help---' d) 1 space + 1 tab + '---help---' e) 1 tab + '---help---' (correct indentation) f) 1 tab + 1 space + '---help---' g) 1 tab + 2 spaces + '---help---' In order to convert all of them to 1 tab + 'help', I ran the following commend: $ find . -name 'Kconfig*' | xargs sed -i 's/^[[:space:]]*---help---/\thelp/' Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
-
- 13 6月, 2020 12 次提交
-
-
由 Ard Biesheuvel 提交于
EFI on ARM only supports short descriptors, and given that it mandates that the MMU and caches are on, it is implied that booting in HYP mode is not supported. However, implementations of EFI exist (i.e., U-Boot) that ignore this requirement, which is not entirely unreasonable, given that it makes HYP mode inaccessible to the operating system. So let's make sure that we can deal with this condition gracefully. We already tolerate booting the EFI stub with the caches off (even though this violates the EFI spec as well), and so we should deal with HYP mode boot with MMU and caches either on or off. - When the MMU and caches are on, we can ignore the HYP stub altogether, since we can carry on executing at HYP. We do need to ensure that we disable the MMU at HYP before entering the kernel proper. - When the MMU and caches are off, we have to drop to SVC mode so that we can set up the page tables using short descriptors. In this case, we need to install the HYP stub as usual, so that we can return to HYP mode before handing over to the kernel proper. Tested-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: NArd Biesheuvel <ardb@kernel.org> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
-
由 Chris Packham 提交于
ZBOOT_ROM_TEXT and ZBOOT_ROM_BSS are defined as 'hex' but had a default of "0". Kconfig will helpfully expand a text entry of 0 to 0x0 but because this is not the same as the default value it was treated as being explicitly set when running 'make savedefconfig' so most arm defconfigs have CONFIG_ZBOOT_ROM_TEXT=0x0 and CONFIG_ZBOOT_ROM_BSS=0x0. Change the default to 0x0 which will mean next time the defconfigs are re-generated the spurious config entries will be removed. Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
-
由 Joerg Roedel 提交于
The patch introducing the struct was probably never compile tested, because it sets a handler with a wrong function signature. Wrap the handler into a functions with the correct signature to fix the build. Fixes: 0f1c9688 ("tty/sysrq: alpha: export and use __sysrq_get_key_op()") Cc: Emil Velikov <emil.l.velikov@gmail.com> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Matt Turner 提交于
Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Xu Wang 提交于
Replace sg++ with sg = sg_next(sg). Signed-off-by: NXu Wang <vulab@iscas.ac.cn> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Mikulas Patocka 提交于
The commits cd0e00c1 and 92d7223a broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Fixes: cd0e00c1 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: stable@vger.kernel.org # v4.17+ Acked-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru> Reviewed-by: NMaciej W. Rozycki <macro@linux-mips.org> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Jason Yan 提交于
Fix the following coccicheck warning: arch/alpha/kernel/sys_eiger.c:179:2-3: Unneeded semicolon Signed-off-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Jason Yan 提交于
Fix the following coccicheck warning: arch/alpha/kernel/osf_sys.c:680:2-3: Unneeded semicolon Signed-off-by: NJason Yan <yanaijie@huawei.com> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Chuhong Yuan 提交于
In commit b6b27355 ("tracing: Use str_has_prefix() instead of using fixed sizes") the newly introduced str_has_prefix() was used to replace error-prone strncmp(str, const, len). Here fix codes with the same pattern. Signed-off-by: NChuhong Yuan <hslester96@gmail.com> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Mikulas Patocka 提交于
Alpha incorrectly reports "0070-0080 : rtc" in /proc/ioports. Fix this, so that it is "0070-007f". Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
Formatting of Kconfig files doesn't look so pretty, so let the Great White Handkerchief come around and clean it up. Signed-off-by: NEnrico Weigelt, metux IT consult <info@metux.net> Signed-off-by: NMatt Turner <mattst88@gmail.com>
-
由 Thomas Gleixner 提交于
The idea of conditionally calling into rcu_irq_enter() only when RCU is not watching turned out to be not completely thought through. Paul noticed occasional premature end of grace periods in RCU torture testing. Bisection led to the commit which made the invocation of rcu_irq_enter() conditional on !rcu_is_watching(). It turned out that this conditional breaks RCU assumptions about the idle task when the scheduler tick happens to be a nested interrupt. Nested interrupts can happen when the first interrupt invokes softirq processing on return which enables interrupts. If that nested tick interrupt does not invoke rcu_irq_enter() then the RCU's irq-nesting checks will believe that this interrupt came directly from idle, which will cause RCU to report a quiescent state. Because this interrupt instead came from a softirq handler which might have been executing an RCU read-side critical section, this can cause the grace period to end prematurely. Change the condition from !rcu_is_watching() to is_idle_task(current) which enforces that interrupts in the idle task unconditionally invoke rcu_irq_enter() independent of the RCU state. This is also correct vs. user mode entries in NOHZ full scenarios because user mode entries bring RCU out of EQS and force the RCU irq nesting state accounting to nested. As only the first interrupt can enter from user mode a nested tick interrupt will enter from kernel mode and as the nesting state accounting is forced to nesting it will not do anything stupid even if rcu_irq_enter() has not been invoked. Fixes: 3eeec385 ("x86/entry: Provide idtentry_entry/exit_cond_rcu()") Reported-by: N"Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: N"Paul E. McKenney" <paulmck@kernel.org> Reviewed-by: N"Paul E. McKenney" <paulmck@kernel.org> Acked-by: NAndy Lutomirski <luto@kernel.org> Acked-by: NFrederic Weisbecker <frederic@kernel.org> Link: https://lkml.kernel.org/r/87wo4cxubv.fsf@nanos.tec.linutronix.de
-
- 12 6月, 2020 6 次提交
-
-
由 Thomas Gleixner 提交于
For no reason other than beginning brainmelt, IDTENTRY_NMI was mapped to IDTENTRY_IST. This is not a problem on 64bit because the IST default entry point maps to IDTENTRY_RAW which does not any entry handling. The surplus function declaration for the noist C entry point is unused and as there is no ASM code emitted for NMI this went unnoticed. On 32bit IDTENTRY_IST maps to a regular IDTENTRY which does the normal entry handling. That is clearly the wrong thing to do for NMI. Map it to IDTENTRY_RAW to unbreak it. The IDTENTRY_NMI mapping needs to stay to avoid emitting ASM code. Fixes: 6271fef0 ("x86/entry: Convert NMI to IDTENTRY_NMI") Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org> Debugged-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/CA+G9fYvF3cyrY+-iw_SZtpN-i2qA2BruHg4M=QYECU2-dNdsMw@mail.gmail.com
-
由 Andy Lutomirski 提交于
BUG/WARN are cleverly optimized using UD2 to handle the BUG/WARN out of line in an exception fixup. But if BUG or WARN is issued in a funny RCU context, then the idtentry_enter...() path might helpfully WARN that the RCU context is invalid, which results in infinite recursion. Split the BUG/WARN handling into an nmi_enter()/nmi_exit() path in exc_invalid_op() to increase the chance to survive the experience. [ tglx: Make the declaration match the implementation ] Signed-off-by: NAndy Lutomirski <luto@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/f8fe40e0088749734b4435b554f73eee53dcf7a8.1591932307.git.luto@kernel.org
-
由 Alexey Kardashevskiy 提交于
Before commit 6cdf3037 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table") we called __find_linux_pte() with a page table pointer from a kvm_nested_guest struct but now we rely on kvmhv_find_nested() which takes an L1 LPID and returns a kvm_nested_guest pointer, however we pass a L0 LPID there and the L2 guest hangs. This fixes the LPID passed to kvmppc_hv_handle_set_rc(). Fixes: 6cdf3037 ("powerpc/kvm/book3s: Use kvm helpers to walk shadow or secondary table") Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200611030559.75257-1-aik@ozlabs.ru
-
由 Ley Foon Tan 提交于
Mark switch cases where we are expecting to fall through. Fix the following warning through the use of the new the new pseudo-keyword fallthrough; arch/nios2/kernel/signal.c:254:12: warning: this statement may fall through [-Wimplicit-fallthrough=] 254 | restart = -2; | ~~~~~~~~^~~~ arch/nios2/kernel/signal.c:255:3: note: here 255 | case ERESTARTNOHAND: | ^~~~ Reported-by: NChristian Brauner <christian.brauner@ubuntu.com> Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NLey Foon Tan <ley.foon.tan@intel.com>
-
由 Joerg Roedel 提交于
The patch introducing the struct was probably never compile tested, because it sets a handler with a wrong function signature. Wrap the handler into a functions with the correct signature to fix the build. Fixes: 0f1c9688 ("tty/sysrq: alpha: export and use __sysrq_get_key_op()") Cc: Emil Velikov <emil.l.velikov@gmail.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Paolo Bonzini 提交于
__kvm_set_memory_region does not use the hva at all, so trying to catch use-after-delete is pointless and, worse, it fails access_ok now that we apply it to all memslots including private kernel ones. This fixes an AVIC regression. Fixes: 09d952c9 ("KVM: check userspace_addr for all memslots") Reported-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-