- 11 9月, 2020 1 次提交
-
-
由 Anshuman Khandual 提交于
pmd_present() and pmd_trans_huge() are expected to behave in the following manner during various phases of a given PMD. It is derived from a previous detailed discussion on this topic [1] and present THP documentation [2]. pmd_present(pmd): - Returns true if pmd refers to system RAM with a valid pmd_page(pmd) - Returns false if pmd refers to a migration or swap entry pmd_trans_huge(pmd): - Returns true if pmd refers to system RAM and is a trans huge mapping ------------------------------------------------------------------------- | PMD states | pmd_present | pmd_trans_huge | ------------------------------------------------------------------------- | Mapped | Yes | Yes | ------------------------------------------------------------------------- | Splitting | Yes | Yes | ------------------------------------------------------------------------- | Migration/Swap | No | No | ------------------------------------------------------------------------- The problem: PMD is first invalidated with pmdp_invalidate() before it's splitting. This invalidation clears PMD_SECT_VALID as below. PMD Split -> pmdp_invalidate() -> pmd_mkinvalid -> Clears PMD_SECT_VALID Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re- affirm pmd_present() as true during the THP split process. To comply with above mentioned semantics, pmd_trans_huge() should also check pmd_present() first before testing presence of an actual transparent huge mapping. The solution: Ideally PMD_TYPE_SECT should have been used here instead. But it shares the bit position with PMD_SECT_VALID which is used for THP invalidation. Hence it will not be there for pmd_present() check after pmdp_invalidate(). A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD entry during invalidation which can help pmd_present() return true and in recognizing the fact that it still points to memory. This bit is transient. During the split process it will be overridden by a page table page representing normal pages in place of erstwhile huge page. Other pmdp_invalidate() callers always write a fresh PMD value on the entry overriding this transient PMD_PRESENT_INVALID bit, which makes it safe. [1]: https://lkml.org/lkml/2018/10/17/231 [2]: https://www.kernel.org/doc/Documentation/vm/transhuge.txtSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
-
- 08 9月, 2020 1 次提交
-
-
由 Anshuman Khandual 提交于
Kernel virtual region [BPF_JIT_REGION_START..BPF_JIT_REGION_END] is missing from address_markers[], hence relevant page table entries are not displayed with /sys/kernel/debug/kernel_page_tables. This adds those missing markers. While here, also rename arch/arm64/mm/dump.c which sounds bit ambiguous, as arch/arm64/mm/ptdump.c instead. Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: NSteven Price <steven.price@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599208259-11191-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NWill Deacon <will@kernel.org>
-
- 28 8月, 2020 7 次提交
-
-
由 James Morse 提交于
AT instructions do a translation table walk and return the result, or the fault in PAR_EL1. KVM uses these to find the IPA when the value is not provided by the CPU in HPFAR_EL1. If a translation table walk causes an external abort it is taken as an exception, even if it was due to an AT instruction. (DDI0487F.a's D5.2.11 "Synchronous faults generated by address translation instructions") While we previously made KVM resilient to exceptions taken due to AT instructions, the device access causes mismatched attributes, and may occur speculatively. Prevent this, by forbidding a walk through memory described as device at stage2. Now such AT instructions will report a stage2 fault. Such a fault will cause KVM to restart the guest. If the AT instructions always walk the page tables, but guest execution uses the translation cached in the TLB, the guest can't make forward progress until the TLB entry is evicted. This isn't a problem, as since commit 5dcd0fdb ("KVM: arm64: Defer guest entry when an asynchronous exception is pending"), KVM will return to the host to process IRQs allowing the rest of the system to keep running. Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdb ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
KVM doesn't expect any synchronous exceptions when executing, any such exception leads to a panic(). AT instructions access the guest page tables, and can cause a synchronous external abort to be taken. The arm-arm is unclear on what should happen if the guest has configured the hardware update of the access-flag, and a memory type in TCR_EL1 that does not support atomic operations. B2.2.6 "Possible implementation restrictions on using atomic instructions" from DDI0487F.a lists synchronous external abort as a possible behaviour of atomic instructions that target memory that isn't writeback cacheable, but the page table walker may behave differently. Make KVM robust to synchronous exceptions caused by AT instructions. Add a get_user() style helper for AT instructions that returns -EFAULT if an exception was generated. While KVM's version of the exception table mixes synchronous and asynchronous exceptions, only one of these can occur at each location. Re-enter the guest when the AT instructions take an exception on the assumption the guest will take the same exception. This isn't guaranteed to make forward progress, as the AT instructions may always walk the page tables, but guest execution may use the translation cached in the TLB. This isn't a problem, as since commit 5dcd0fdb ("KVM: arm64: Defer guest entry when an asynchronous exception is pending"), KVM will return to the host to process IRQs allowing the rest of the system to keep running. Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdb ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
KVM has a one instruction window where it will allow an SError exception to be consumed by the hypervisor without treating it as a hypervisor bug. This is used to consume asynchronous external abort that were caused by the guest. As we are about to add another location that survives unexpected exceptions, generalise this code to make it behave like the host's extable. KVM's version has to be mapped to EL2 to be accessible on nVHE systems. The SError vaxorcism code is a one instruction window, so has two entries in the extable. Because the KVM code is copied for VHE and nVHE, we end up with four entries, half of which correspond with code that isn't mapped. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Frank van der Linden 提交于
vdso32 should only be installed if CONFIG_COMPAT_VDSO is enabled, since it's not even supposed to be compiled otherwise, and arm64 builds without a 32bit crosscompiler will fail. Fixes: 8d75785a ("ARM64: vdso32: Install vdso32 from vdso_install") Signed-off-by: NFrank van der Linden <fllinden@amazon.com> Cc: stable@vger.kernel.org [5.4+] Link: https://lore.kernel.org/r/20200827234012.19757-1-fllinden@amazon.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Sami Tolvanen 提交于
Commit 7c78f67e ("arm64: enable tlbi range instructions") breaks LLVM's integrated assembler, because -Wa,-march is only passed to external assemblers and therefore, the new instructions are not enabled when IAS is used. This change adds a common architecture version preamble, which can be used in inline assembly blocks that contain instructions that require a newer architecture version, and uses it to fix __TLBI_0 and __TLBI_1 with ARM64_TLB_RANGE. Fixes: 7c78f67e ("arm64: enable tlbi range instructions") Signed-off-by: NSami Tolvanen <samitolvanen@google.com> Tested-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Link: https://github.com/ClangBuiltLinux/linux/issues/1106 Link: https://lore.kernel.org/r/20200827203608.1225689-1-samitolvanen@google.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Christophe Leroy 提交于
low_sleep_handler() can't restore the context from virtual stack because the stack can hardly be accessed with MMU OFF. For now, disable VMAP stack when CONFIG_ADB_PMU is selected. Fixes: cd08f109 ("powerpc/32s: Enable CONFIG_VMAP_STACK") Cc: stable@vger.kernel.org # v5.6+ Reported-by: NGiuseppe Sacco <giuseppe@sguazz.it> Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/ec96c15bfa1a7415ab604ee1c98cd45779c08be0.1598553015.git.christophe.leroy@csgroup.eu
-
由 Gustavo A. R. Silva 提交于
Fallthrough annotations for consecutive default and case labels are not necessary. Reported-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 27 8月, 2020 9 次提交
-
-
由 Pratik Rajesh Sampat 提交于
cpuidle stop state implementation has minor optimizations for P10 where hardware preserves more SPR registers compared to P9. The current P9 driver works for P10, although does few extra save-restores. P9 driver can provide the required power management features like SMT thread folding and core level power savings on a P10 platform. Until the P10 stop driver is available, revert the commit which allows for only P9 systems to utilize cpuidle and blocks all idle stop states for P10. CPU idle states are enabled and tested on the P10 platform with this fix. This reverts commit 8747bf36. Fixes: 8747bf36 ("powerpc/powernv/idle: Replace CPU feature check with PVR check") Signed-off-by: NPratik Rajesh Sampat <psampat@linux.ibm.com> Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200826082918.89306-1-psampat@linux.ibm.com
-
由 Athira Rajeev 提交于
IMC trace-mode uses MSR[HV/PR] bits to set the cpumode for the instruction pointer captured in each sample. The bits are fetched from the third double word of the trace record. Reading third double word from IMC trace record should use be64_to_cpu() along with READ_ONCE inorder to fetch correct MSR[HV/PR] bits. Patch addresses this change. Currently we are using PERF_RECORD_MISC_HYPERVISOR as cpumode if MSR HV is 1 and PR is 0 which means the address is from host counter. But using PERF_RECORD_MISC_HYPERVISOR for host counter data will fail to resolve the address -> symbol during "perf report" because perf tools side uses PERF_RECORD_MISC_KERNEL to represent the host counter data. Therefore, fix the trace imc sample data to use PERF_RECORD_MISC_KERNEL as cpumode for host kernel information. Fixes: 77ca3951 ("powerpc/perf: Add kernel support for new MSR[HV PR] bits in trace-imc") Signed-off-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/1598424029-1662-1-git-send-email-atrajeev@linux.vnet.ibm.com
-
由 Alexey Kardashevskiy 提交于
The bhrb_filter_map ("The Branch History Rolling Buffer") callback is only defined in raw CPUs' power_pmu structs. The "architected" CPUs use generic_compat_pmu, which does not have this callback, and crashes occur if a user tries to enable branch stack for an event. This add a NULL pointer check for bhrb_filter_map() which behaves as if the callback returned an error. This does not add the same check for config_bhrb() as the only caller checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0. Fixes: be80e758 ("powerpc/perf: Add generic compat mode pmu driver") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: NMadhavan Srinivasan <maddy@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200602025612.62707-1-aik@ozlabs.ru
-
由 Michael Ellerman 提交于
The recent commit 01eb0187 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") changed some of the handling of floating point/vector restore. In particular it caused current->thread.fpexc_mode to be copied into the current MSR (via msr_check_and_set()), rather than just into regs->msr (which is moved into MSR on return to userspace). This can lead to a crash in the kernel if we take a floating point exception when restoring FPSCR: Oops: Exception in kernel mode, sig: 8 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV Modules linked in: CPU: 3 PID: 101213 Comm: ld64.so.2 Not tainted 5.9.0-rc1-00098-g18445bf4-dirty #9 NIP: c00000000000fbb4 LR: c00000000001a7ac CTR: c000000000183570 REGS: c0000016b7cfb3b0 TRAP: 0700 Not tainted (5.9.0-rc1-00098-g18445bf4-dirty) MSR: 900000000290b933 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44002444 XER: 00000000 CFAR: c00000000001a7a8 IRQMASK: 1 GPR00: c00000000001ae40 c0000016b7cfb640 c0000000011b7f00 c000001542a0f740 GPR04: c000001542a0f720 c000001542a0eb00 0000000000000900 c000001542a0eb00 GPR08: 000000000000000a 0000000000002000 9000000000009033 0000000000000000 GPR12: 0000000000004000 c0000017ffffd900 0000000000000001 c000000000df5a58 GPR16: c000000000e19c18 c0000000010e1123 0000000000000001 c000000000e1a638 GPR20: 0000000000000000 c0000000044b1d00 0000000000000000 c000001542a0f2a0 GPR24: 00000016c7fe0000 c000001542a0f720 c000000001c93da0 c000000000fe5f28 GPR28: c000001542a0f720 0000000000800000 c0000016b7cfbe90 0000000002802900 NIP load_fp_state+0x4/0x214 LR restore_math+0x17c/0x1f0 Call Trace: 0xc0000016b7cfb680 (unreliable) __switch_to+0x330/0x460 __schedule+0x318/0x920 schedule+0x74/0x140 schedule_timeout+0x318/0x3f0 wait_for_completion+0xc8/0x210 call_usermodehelper_exec+0x234/0x280 do_coredump+0xedc/0x13c0 get_signal+0x1d4/0xbe0 do_notify_resume+0x1a0/0x490 interrupt_exit_user_prepare+0x1c4/0x230 interrupt_return+0x14/0x1c0 Instruction dump: ebe10168 e88101a0 7c8ff120 382101e0 e8010010 7c0803a6 4e800020 790605c4 782905c4 7c0008a8 7c0008a8 c8030200 <fffe058e> 48000088 c8030000 c8230010 Fix it by only loading the fpexc_mode value into regs->msr. Also add a comment to explain that although VSX is subject to the value of fpexc_mode, we don't have to handle that separately because we only allow VSX to be enabled if FP is also enabled. Fixes: 01eb0187 ("powerpc/64s: Fix restore_math unnecessarily changing MSR") Reported-by: NMilton Miller <miltonm@us.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Reviewed-by: NNicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20200825093424.3967813-1-mpe@ellerman.id.au
-
由 Nicholas Piggin 提交于
Kernel entry sets PPR to HMT_MEDIUM by convention. The scv entry path missed this. Fixes: 7fa95f9a ("powerpc/64s: system call support for scv/rfscv instructions") Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200825075309.224184-1-npiggin@gmail.com
-
由 Thomas Gleixner 提交于
Several people reported that 5.8 broke the interrupt affinity setting mechanism. The consolidation of the entry code reused the regular exception entry code for device interrupts and changed the way how the vector number is conveyed from ptregs->orig_ax to a function argument. The low level entry uses the hardware error code slot to push the vector number onto the stack which is retrieved from there into a function argument and the slot on stack is set to -1. The reason for setting it to -1 is that the error code slot is at the position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax indicates that the entry came via a syscall. If it's not set to a negative value then a signal delivery on return to userspace would try to restart a syscall. But there are other places which rely on pt_regs::orig_ax being a valid indicator for syscall entry. But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the interrupt affinity setting mechanism, which was overlooked when this change was made. Moving interrupts on x86 happens in several steps. A new vector on a different CPU is allocated and the relevant interrupt source is reprogrammed to that. But that's racy and there might be an interrupt already in flight to the old vector. So the old vector is preserved until the first interrupt arrives on the new vector and the new target CPU. Once that happens the old vector is cleaned up, but this cleanup still depends on the vector number being stored in pt_regs::orig_ax, which is now -1. That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector always false. As a consequence the interrupt is moved once, but then it cannot be moved anymore because the cleanup of the old vector never happens. There would be several ways to convey the vector information to that place in the guts of the interrupt handling, but on deeper inspection it turned out that this check is pointless and a leftover from the old affinity model of X86 which supported multi-CPU affinities. Under this model it was possible that an interrupt had an old and a new vector on the same CPU, so the vector match was required. Under the new model the effective affinity of an interrupt is always a single CPU from the requested affinity mask. If the affinity mask changes then either the interrupt stays on the CPU and on the same vector when that CPU is still in the new affinity mask or it is moved to a different CPU, but it is never moved to a different vector on the same CPU. Ergo the cleanup check for the matching vector number is not required and can be removed which makes the dependency on pt_regs:orig_ax go away. The remaining check for new_cpu == smp_processsor_id() is completely sufficient. If it matches then the interrupt was successfully migrated and the cleanup can proceed. For paranoia sake add a warning into the vector assignment code to validate that the assumption of never moving to a different vector on the same CPU holds. Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs") Reported-by: NAlex bykov <alex.bykov@scylladb.com> Reported-by: NAvi Kivity <avi@scylladb.com> Reported-by: NAlexander Graf <graf@amazon.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NAlexander Graf <graf@amazon.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
-
由 Ashok Raj 提交于
There is a race when taking a CPU offline. Current code looks like this: native_cpu_disable() { ... apic_soft_disable(); /* * Any existing set bits for pending interrupt to * this CPU are preserved and will be sent via IPI * to another CPU by fixup_irqs(). */ cpu_disable_common(); { .... /* * Race window happens here. Once local APIC has been * disabled any new interrupts from the device to * the old CPU are lost */ fixup_irqs(); // Too late to capture anything in IRR. ... } } The fix is to disable the APIC *after* cpu_disable_common(). Testing was done with a USB NIC that provided a source of frequent interrupts. A script migrated interrupts to a specific CPU and then took that CPU offline. Fixes: 60dcaad5 ("x86/hotplug: Silence APIC and NMI when CPU is dead") Reported-by: NEvan Green <evgreen@chromium.org> Signed-off-by: NAshok Raj <ashok.raj@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMathias Nyman <mathias.nyman@linux.intel.com> Tested-by: NEvan Green <evgreen@chromium.org> Reviewed-by: NEvan Green <evgreen@chromium.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/ Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
-
由 Vasily Gorbik 提交于
The kernel currently crashes if 4-level paging is used. Add missing p4d_populate for just allocated pud entry. Fixes: 3e0d3e40 ("s390/vmem: consolidate vmem_add_range() and vmem_remove_range()") Reviewed-by: NGerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Sven Schnelle 提交于
Since commit a21ee605 ("lockdep: Change hardirq{s_enabled,_context} to per-cpu variables") the lockdep code itself uses percpu variables. This leads to recursions because the percpu macros are calling preempt_enable() which might call trace_preempt_on(). Signed-off-by: NSven Schnelle <svens@linux.ibm.com> Reviewed-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 26 8月, 2020 7 次提交
-
-
由 Nicholas Piggin 提交于
Problem: raw_local_irq_save(); // software state on local_irq_save(); // software state off ... local_irq_restore(); // software state still off, because we don't enable IRQs raw_local_irq_restore(); // software state still off, *whoopsie* existing instances: - lock_acquire() raw_local_irq_save() __lock_acquire() arch_spin_lock(&graph_lock) pv_wait() := kvm_wait() (same or worse for Xen/HyperV) local_irq_save() - trace_clock_global() raw_local_irq_save() arch_spin_lock() pv_wait() := kvm_wait() local_irq_save() - apic_retrigger_irq() raw_local_irq_save() apic->send_IPI() := default_send_IPI_single_phys() local_irq_save() Possible solutions: A) make it work by enabling the tracing inside raw_*() B) make it work by keeping tracing disabled inside raw_*() C) call it broken and clean it up now Now, given that the only reason to use the raw_* variant is because you don't want tracing. Therefore A) seems like a weird option (although it can be done). C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just because there is one raw user, this strips the validation/tracing off for all the other users. So we pick B) and declare any code that ends up doing: raw_local_irq_save() local_irq_save() lockdep_assert_irqs_disabled(); broken. AFAICT this problem has existed forever, the only reason it came up is because commit: 859d069e ("lockdep: Prepare for NMI IRQ state tracking") changed IRQ tracing vs lockdep recursion and the first instance is fairly common, the other cases hardly ever happen. Signed-off-by: NNicholas Piggin <npiggin@gmail.com> [rewrote changelog] Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com
-
由 Peter Zijlstra 提交于
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Paul Burton <paulburton@kernel.org> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20200826101653.GE1362448@hirez.programming.kicks-ass.net
-
由 Peter Zijlstra 提交于
Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20200821085348.664425120@infradead.org
-
由 Peter Zijlstra 提交于
Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.604899379@infradead.org
-
由 Peter Zijlstra 提交于
Unused remnants Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.487040689@infradead.org
-
由 Peter Zijlstra 提交于
Remove trace_cpu_idle() from the arch_cpu_idle() implementations and put it in the generic code, right before disabling RCU. Gets rid of more trace_*_rcuidle() users. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.428433395@infradead.org
-
由 Peter Zijlstra 提交于
This allows moving the leave_mm() call into generic code before rcu_idle_enter(). Gets rid of more trace_*_rcuidle() users. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NMarco Elver <elver@google.com> Link: https://lkml.kernel.org/r/20200821085348.369441600@infradead.org
-
- 24 8月, 2020 3 次提交
-
-
由 Shawn Anastasio 提交于
Since migration of guests using SAO to ISA 3.1 hosts may cause issues, disable PROT_SAO in LPARs by default and introduce a new Kconfig option PPC_PROT_SAO_LPAR to allow users to enable it if desired. Signed-off-by: NShawn Anastasio <shawn@anastas.io> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-3-shawn@anastas.io
-
由 Shawn Anastasio 提交于
This reverts commit 5c9fa16e. Since PROT_SAO can still be useful for certain classes of software, reintroduce it. Concerns about guest migration for LPARs using SAO will be addressed next. Signed-off-by: NShawn Anastasio <shawn@anastas.io> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821185558.35561-2-shawn@anastas.io
-
由 Gustavo A. R. Silva 提交于
Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 22 8月, 2020 3 次提交
-
-
由 Will Deacon 提交于
When an MMU notifier call results in unmapping a range that spans multiple PGDs, we end up calling into cond_resched_lock() when crossing a PGD boundary, since this avoids running into RCU stalls during VM teardown. Unfortunately, if the VM is destroyed as a result of OOM, then blocking is not permitted and the call to the scheduler triggers the following BUG(): | BUG: sleeping function called from invalid context at arch/arm64/kvm/mmu.c:394 | in_atomic(): 1, irqs_disabled(): 0, non_block: 1, pid: 36, name: oom_reaper | INFO: lockdep is turned off. | CPU: 3 PID: 36 Comm: oom_reaper Not tainted 5.8.0 #1 | Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 | Call trace: | dump_backtrace+0x0/0x284 | show_stack+0x1c/0x28 | dump_stack+0xf0/0x1a4 | ___might_sleep+0x2bc/0x2cc | unmap_stage2_range+0x160/0x1ac | kvm_unmap_hva_range+0x1a0/0x1c8 | kvm_mmu_notifier_invalidate_range_start+0x8c/0xf8 | __mmu_notifier_invalidate_range_start+0x218/0x31c | mmu_notifier_invalidate_range_start_nonblock+0x78/0xb0 | __oom_reap_task_mm+0x128/0x268 | oom_reap_task+0xac/0x298 | oom_reaper+0x178/0x17c | kthread+0x1e4/0x1fc | ret_from_fork+0x10/0x30 Use the new 'flags' argument to kvm_unmap_hva_range() to ensure that we only reschedule if MMU_NOTIFIER_RANGE_BLOCKABLE is set in the notifier flags. Cc: <stable@vger.kernel.org> Fixes: 8b3405e3 ("kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd") Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-3-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Will Deacon 提交于
The 'flags' field of 'struct mmu_notifier_range' is used to indicate whether invalidate_range_{start,end}() are permitted to block. In the case of kvm_mmu_notifier_invalidate_range_start(), this field is not forwarded on to the architecture-specific implementation of kvm_unmap_hva_range() and therefore the backend cannot sensibly decide whether or not to block. Add an extra 'flags' parameter to kvm_unmap_hva_range() so that architectures are aware as to whether or not they are permitted to block. Cc: <stable@vger.kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Message-Id: <20200811102725.7121-2-will@kernel.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Stephen Boyd 提交于
Add the 32-bit vdso Makefile to the vdso_install rule so that 'make vdso_install' installs the 32-bit compat vdso when it is compiled. Fixes: a7f71a2c ("arm64: compat: Add vDSO") Signed-off-by: NStephen Boyd <swboyd@chromium.org> Reviewed-by: NVincenzo Frascino <vincenzo.frascino@arm.com> Acked-by: NWill Deacon <will@kernel.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Link: https://lore.kernel.org/r/20200818014950.42492-1-swboyd@chromium.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 21 8月, 2020 9 次提交
-
-
由 Sean Christopherson 提交于
KVM has an optmization to avoid expensive MRS read/writes on VMENTER/EXIT. It caches the MSR values and restores them either when leaving the run loop, on preemption or when going out to user space. The affected MSRs are not required for kernel context operations. This changed with the recently introduced mechanism to handle FSGSBASE in the paranoid entry code which has to retrieve the kernel GSBASE value by accessing per CPU memory. The mechanism needs to retrieve the CPU number and uses either LSL or RDPID if the processor supports it. Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and lazily restored MSRs, which means between the point where the guest value is written and the point of restore, MSR_TSC_AUX contains a random number. If an NMI or any other exception which uses the paranoid entry path happens in such a context, then RDPID returns the random guest MSR_TSC_AUX value. As a consequence this reads from the wrong memory location to retrieve the kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*() operations. If the GSBASE in the exception handler points to the per CPU memory of a different CPU then this has the obvious consequences of data corruption and crashes. As the paranoid entry path is the only place which accesses MSR_TSX_AUX (via RDPID) and the fallback via LSL is not significantly slower, remove the RDPID alternative from the entry path and always use LSL. The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT which would be inflicting massive overhead on that code path. [ tglx: Rewrote changelog ] Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro") Reported-by: NTom Lendacky <thomas.lendacky@amd.com> Debugged-by: NTom Lendacky <thomas.lendacky@amd.com> Suggested-by: NAndy Lutomirski <luto@kernel.org> Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
-
由 Kajol Jain 提交于
Commit 792f73f7 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask") added cpumask file as part of hv-24x7 driver inside the interface folder. The cpumask file is supposed to be in the top folder of the PMU driver in order to make hotplug work. This patch fixes that issue and creates new group 'cpumask_attr_group' to add cpumask file and make sure it added in top folder. command:# cat /sys/devices/hv_24x7/cpumask 0 Fixes: 792f73f7 ("powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask") Signed-off-by: NKajol Jain <kjain@linux.ibm.com> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200821080610.123997-1-kjain@linux.ibm.com
-
由 Christophe Leroy 提交于
In is_module_segment(), when VMALLOC_END is over 0xf0000000, ALIGN(VMALLOC_END, SZ_256M) has value 0. In that case, addr >= ALIGN(VMALLOC_END, SZ_256M) is always true then is_module_segment() always returns false. Use (ALIGN(VMALLOC_END, SZ_256M) - 1) which will have value 0xffffffff and will be suitable for the comparison. Fixes: c4964331 ("powerpc/32s: Only leave NX unset on segments used for modules") Reported-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu> Tested-by: NAndreas Schwab <schwab@linux-m68k.org> Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/09fc73fe9c7423c6b4cf93f93df9bb0ed8eefab5.1597994047.git.christophe.leroy@csgroup.eu
-
由 Rob Herring 提交于
If guests don't have certain CPU erratum workarounds implemented, then there is a possibility a guest can deadlock the system. IOW, only trusted guests should be used on systems with the erratum. This is the case for Cortex-A57 erratum 832075. Signed-off-by: NRob Herring <robh@kernel.org> Acked-by: NWill Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: James Morse <james.morse@arm.com> Cc: Julien Thierry <julien.thierry.kdev@gmail.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/20200803193127.3012242-2-robh@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Marc Zyngier 提交于
As we can now switch from a system that isn't affected by 1418040 to a system that globally is affected, let's allow affected CPUs to come in at a later time. Signed-off-by: NMarc Zyngier <maz@kernel.org> Tested-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: NStephen Boyd <swboyd@chromium.org> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200731173824.107480-3-maz@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Marc Zyngier 提交于
Instead of dealing with erratum 1418040 on each entry and exit, let's move the handling to __switch_to() instead, which has several advantages: - It can be applied when it matters (switching between 32 and 64 bit tasks). - It is written in C (yay!) - It can rely on static keys rather than alternatives Signed-off-by: NMarc Zyngier <maz@kernel.org> Tested-by: NSai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Reviewed-by: NStephen Boyd <swboyd@chromium.org> Acked-by: NWill Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20200731173824.107480-2-maz@kernel.orgSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 Bin Meng 提交于
This adds SiFive drivers to rv32_defconfig, to keep in sync with the 64-bit config. This is useful when testing 32-bit kernel with QEMU 'sifive_u' 32-bit machine. Signed-off-by: NBin Meng <bin.meng@windriver.com> Reviewed-by: NAnup Patel <anup@brainfault.org> Reviewed-by: NAlistair Francis <alistair.francis@wdc.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Anup Patel 提交于
Right now the RISC-V timer driver is convoluted to support: 1. Linux RISC-V S-mode (with MMU) where it will use TIME CSR for clocksource and SBI timer calls for clockevent device. 2. Linux RISC-V M-mode (without MMU) where it will use CLINT MMIO counter register for clocksource and CLINT MMIO compare register for clockevent device. We now have a separate CLINT timer driver which also provide CLINT based IPI operations so let's remove CLINT MMIO related code from arch/riscv directory and RISC-V timer driver. Signed-off-by: NAnup Patel <anup.patel@wdc.com> Tested-by: NEmil Renner Berhing <kernel@esmil.dk> Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: NAtish Patra <atish.patra@wdc.com> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-
由 Anup Patel 提交于
We add mechanism to set custom IPI operations so that CLINT driver from drivers directory can provide custom IPI operations. Signed-off-by: NAnup Patel <anup.patel@wdc.com> Tested-by: NEmil Renner Berhing <kernel@esmil.dk> Reviewed-by: NAtish Patra <atish.patra@wdc.com> Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com> Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
-