- 30 11月, 2019 28 次提交
-
-
由 Vasily Gorbik 提交于
A comment in arch/s390/include/asm/unwind.h says: > If 'first_frame' is not zero unwind_start skips unwind frames until it > reaches the specified stack pointer. > The end of the unwinding is indicated with unwind_done, this can be true > right after unwind_start, e.g. with first_frame!=0 that can not be found. > unwind_next_frame skips to the next frame. > Once the unwind is completed unwind_error() can be used to check if there > has been a situation where the unwinder could not correctly understand > the tasks call chain. With this change backchain unwinder now comply with behaviour described. As well as matches orc unwinder implementation. Now unwinder starts from reliable state, i.e. __unwind_start own stack frame is taken or stack frame generated by __switch_to (ksp) - both known to be valid. In case of pt_regs %r15 is better match for pt_regs psw, than sometimes random "sp" caller passed. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Add unwinding from program check handler tests. Unwinder should be able to unwind through pt_regs stored by program check handler on task stack. Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Add unwinding from irq context tests. Unwinder should be able to unwind through irq stack to task stack up to task pt_regs. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Add stack name, sp and reliable information into test unwinding results. Also consider ip outside of kernel text as failure if the state is reported reliable. Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Add CALL_ON_STACK helper testing. Tests make sure that we can unwind from switched stack to original one up to task pt_regs (nodat -> task stack). UWM_SWITCH_STACK could not be used together with UWM_THREAD because get_stack_info explicitly restricts unwinding to task stack if task != current. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
CALL_ON_STACK defines and initializes register variables. Inline assembly which follows might trigger compiler to generate memory access for "stack" argument (e.g. in case of S390_lowcore.nodat_stack). This memory access produces a function call under kasan with outline instrumentation which clobbers registers. Switch "stack" argument in CALL_ON_STACK helper to use memory reference constraint and perform load instead. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Currently unwinder test passes if unwinding results contain unwindme_func2 and unwindme_func1 functions. Now that unwinder reports success upon reaching task pt_regs, check that unwinding ended successfully in every test. Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Ilya Leoshkevich 提交于
unwind_for_each_frame can take at least 8 different sets of parameters. Add a test to make sure they all are handled in a sane way. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Co-developed-by: NVasily Gorbik <gor@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Always inline get_stack_pointer() to avoid potential problems due to compiler inlining decisions, i.e. getting stack pointer of get_stack_pointer() itself which is later reused. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Niklas Schnelle 提交于
The config option CONFIG_PCI_NR_FUNCTIONS sets a limit on the number of PCI functions we can support. Previously on reaching this limit there was no indication why newly attached devices are not recognized by Linux which could be quite confusing. Thus this patch adds a pr_err() for this case. Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Niklas Schnelle 提交于
When UID checking was turned off during runtime in the underlying hypervisor, a PCI device may be attached with the same UID. This is already detected but happens silently. Add an error message so it can more easily be understood why a device was not added. Reviewed-by: NPeter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: NNiklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Thomas Richter 提交于
Each SBDT is located at a 4KB page and contains 512 entries. Each entry of a SDBT points to a SDB, a 4KB page containing sampled data. The last entry is a link to another SDBT page. When an event is created the function sequence executed is: __hw_perf_event_init() +--> allocate_buffers() +--> realloc_sampling_buffers() +---> alloc_sample_data_block() Both functions realloc_sampling_buffers() and alloc_sample_data_block() allocate pages and the allocation can fail. This is handled correctly and all allocated pages are freed and error -ENOMEM is returned to the top calling function. Finally the event is not created. Once the event has been created, the amount of initially allocated SDBT and SDB can be too low. This is detected during measurement interrupt handling, where the amount of lost samples is calculated. If the number of lost samples is too high considering sampling frequency and already allocated SBDs, the number of SDBs is enlarged during the next execution of cpumsf_pmu_enable(). If more SBDs need to be allocated, functions realloc_sampling_buffers() +---> alloc-sample_data_block() are called to allocate more pages. Page allocation may fail and the returned error is ignored. A SDBT and SDB setup already exists. However the modified SDBTs and SDBs might end up in a situation where the first entry of an SDBT does not point to an SDB, but another SDBT, basicly an SBDT without payload. This can not be handled by the interrupt handler, where an SDBT must have at least one entry pointing to an SBD. Add a check to avoid SDBTs with out payload (SDBs) when enlarging the buffer setup. Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Thomas Richter 提交于
The macro TEAR_REG() saves the last used SDBT address in the perf_hw_event structure. This is also done by function hw_reset_registers() which is a one-liner and simply uses macro TEAR_REG(). Remove function hw_reset_registers(), which is only used one time and use macro TEAR_REG() instead. This macro is used throughout the code anyway. Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Thomas Richter 提交于
In interrupt handling the function extend_sampling_buffer() is called after checking for a possibly extension. This check is not necessary as the called function itself performs this check again. Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Thomas Richter 提交于
Replace hard coded function names in debug statements by the "%s ...", __func__ construct suggested by checkpatch.pl script. Use consistent debug print format of the form variable blank value. Also add leading 0x for all hex values. Print allocated page addresses consistantly as hex numbers with leading 0x. Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Gerald Schaefer 提交于
The KASLR offset is added to vmcoreinfo in arch_crash_save_vmcoreinfo(), so that it can be found by crash when processing kernel dumps. However, arch_crash_save_vmcoreinfo() is called during a subsys_initcall, so if the kernel crashes before that, we have no vmcoreinfo and no KASLR offset. Fix this by storing the KASLR offset in the lowcore, where the vmcore_info pointer will be stored, and where it can be found by crash. In order to make it distinguishable from a real vmcore_info pointer, mark it as uneven (KASLR offset itself is aligned to THREAD_SIZE). When arch_crash_save_vmcoreinfo() stores the real vmcore_info pointer in the lowcore, it overwrites the KASLR offset. At that point, the KASLR offset is not yet added to vmcoreinfo, so we also need to move the mem_assign_absolute() behind the vmcoreinfo_append_str(). Fixes: b2d24b97 ("s390/kernel: add support for kernel address space layout randomization (KASLR)") Cc: <stable@vger.kernel.org> # v5.2+ Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Consider reaching task pt_regs graceful unwinder termination. Task pt_regs itself never contains a valid state to which a task might return within the kernel context (user task pt_regs is a special case). Since we already avoid printing user task pt_regs and in most cases we don't even bother filling task pt_regs psw and r15 with something reasonable simply skip task pt_regs altogether. With this change unwind_error() now accurately represent whether unwinder reached task pt_regs successfully or failed along the way. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Add missing allocation of pt_regs at the bottom of the stack. This makes it consistent with other stack setup cases and also what stack unwinder expects. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Currently unwinder yields 2 entries when pt_regs are met: sp="address of pt_regs itself" ip=pt_regs->psw sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]" And neither of those 2 states (combination of sp and ip) ever happened. reuse_sp has been introduced by commit a1d863ac ("s390/unwind: fix mixing regs and sp"). reuse_sp=true makes unwinder keen to produce the following result, when pt_regs are given (as an arg to unwind_start): sp=pt_regs->gprs[15] ip=pt_regs->psw sp=pt_regs->gprs[15] ip="r14 from stack frame pointed by pt_regs->gprs[15]" The first state is an actual state in which a task was when pt_regs were collected. The second state is marked unreliable and is for debugging purposes to cover the case when a task has been interrupted in between stack frame allocation and writing back_chain - in this case r14 might show an actual caller. Make unwinder behaviour enabled via reuse_sp=true default and drop the special case handling. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
If unwinder is looking at pt_regs which is not on stack then something went wrong and an error has to be reported rather than successful unwinding termination. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
CALL_ON_STACK is intended to be used for temporary stack switching with potential return to the caller. When CALL_ON_STACK is misused to switch from nodat stack to task stack back_chain information would later lead stack unwinder from task stack into (per cpu) nodat stack which is reused for other purposes. This would yield confusing unwinding result or errors. To avoid that introduce CALL_ON_STACK_NORETURN to be used instead. It makes sure that back_chain is zeroed and unwinder finishes gracefully ending up at task pt_regs. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Currently CALL_ON_STACK saves r15 as back_chain in the first stack frame of the stack we about to switch to. But if a function which uses CALL_ON_STACK calls other function it allocates a stack frame for a callee. In this case r15 is pointing to a callee stack frame and not a stack frame of function itself. This results in dummy unwinding entry with random sp and ip values. Introduce and utilize current_frame_address macro to get an address of actual function stack frame. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Avoid mixture of task == NULL and task == current meaning the same thing and simply always initialize task with current in unwind_start. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Make sure preemption is disabled when temporary switching to nodat stack with CALL_ON_STACK helper, because nodat stack is per cpu. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
disabled_wait uses _THIS_IP_ and assumes that compiler would inline it. Make sure this assumption is always correct by utilizing __always_inline. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Heiko Carstens 提交于
getcpu reads the required values for cpu and node with two instructions. This might lead to an inconsistent result if user space gets preempted and migrated to a different CPU between the two instructions. Fix this by using just a single instruction to read both values at once. This is currently rather a theoretical bug, since there is no real NUMA support available (except for NUMA emulation). Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Heiko Carstens 提交于
When a secondary CPU is brought up it must initialize its control registers. CPU A which triggers that a secondary CPU B is brought up stores its control register contents into the lowcore of new CPU B, which then loads these values on startup. This is problematic in various ways: the control register which contains the home space ASCE will correctly contain the kernel ASCE; however control registers for primary and secondary ASCEs are initialized with whatever values were present in CPU A. Typically: - the primary ASCE will contain the user process ASCE of the process that triggered onlining of CPU B. - the secondary ASCE will contain the percpu VDSO ASCE of CPU A. Due to lazy ASCE handling we may also end up with other combinations. When then CPU B switches to a different process (!= idle) it will fixup the primary ASCE. However the problem is that the (wrong) ASCE from CPU A was loaded into control register 1: as soon as an ASCE is attached (aka loaded) a CPU is free to generate TLB entries using that address space. Even though it is very unlikey that CPU B will actually generate such entries, this could result in TLB entries of the address space of the process that ran on CPU A. These entries shouldn't exist at all and could cause problems later on. Furthermore the secondary ASCE of CPU B will not be updated correctly. This means that processes may see wrong results or even crash if they access VDSO data on CPU B. The correct VDSO ASCE will eventually be loaded on return to user space as soon as the kernel executed a call to strnlen_user or an atomic futex operation on CPU B. Fix both issues by intializing the to be loaded control register contents with the correct ASCEs and also enforce (re-)loading of the ASCEs upon first context switch and return to user space. Fixes: 0aaba41b ("s390: remove all code using the access register mode") Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Ilya Leoshkevich 提交于
On s390 bpf_get_stack_raw_tp() returns 0 entries for both kernel and user stacks. While there is no practical unwinding solution for userspace on s390 at this moment, there certainly is a kernel unwinder. However, it is not properly integrated with BPF. In order to start unwinding, bpf_get_stack_raw_tp() obtains the current kernel register values using perf_fetch_caller_regs(), which is not implemented for s390. The actual unwinding then happens by passing those registers to perf_callchain_kernel(). Implement perf_arch_fetch_caller_regs() for s390, where __builtin_frame_address(0) points to back_chain. Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 21 11月, 2019 3 次提交
-
-
由 Pavel Tatashin 提交于
It is safer and simpler to drop the uaccess assembly macros in favour of inline C functions. Although this bloats the Image size slightly, it aligns our user copy routines with '{get,put}_user()' and generally makes the code a lot easier to reason about. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com> [will: tweaked commit message and changed temporary variable names] Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Pavel Tatashin 提交于
A number of our uaccess routines ('__arch_clear_user()' and '__arch_copy_{in,from,to}_user()') fail to re-enable PAN if they encounter an unhandled fault whilst accessing userspace. For CPUs implementing both hardware PAN and UAO, this bug has no effect when both extensions are in use by the kernel. For CPUs implementing hardware PAN but not UAO, this means that a kernel using hardware PAN may execute portions of code with PAN inadvertently disabled, opening us up to potential security vulnerabilities that rely on userspace access from within the kernel which would usually be prevented by this mechanism. In other words, parts of the kernel run the same way as they would on a CPU without PAN implemented/emulated at all. For CPUs not implementing hardware PAN and instead relying on software emulation via 'CONFIG_ARM64_SW_TTBR0_PAN=y', the impact is unfortunately much worse. Calling 'schedule()' with software PAN disabled means that the next task will execute in the kernel using the page-table and ASID of the previous process even after 'switch_mm()', since the actual hardware switch is deferred until return to userspace. At this point, or if there is a intermediate call to 'uaccess_enable()', the page-table and ASID of the new process are installed. Sadly, due to the changes introduced by KPTI, this is not an atomic operation and there is a very small window (two instructions) where the CPU is configured with the page-table of the old task and the ASID of the new task; a speculative access in this state is disastrous because it would corrupt the TLB entries for the new task with mappings from the previous address space. As Pavel explains: | I was able to reproduce memory corruption problem on Broadcom's SoC | ARMv8-A like this: | | Enable software perf-events with PERF_SAMPLE_CALLCHAIN so userland's | stack is accessed and copied. | | The test program performed the following on every CPU and forking | many processes: | | unsigned long *map = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, | MAP_SHARED | MAP_ANONYMOUS, -1, 0); | map[0] = getpid(); | sched_yield(); | if (map[0] != getpid()) { | fprintf(stderr, "Corruption detected!"); | } | munmap(map, PAGE_SIZE); | | From time to time I was getting map[0] to contain pid for a | different process. Ensure that PAN is re-enabled when returning after an unhandled user fault from our uaccess routines. Cc: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NMark Rutland <mark.rutland@arm.com> Cc: <stable@vger.kernel.org> Fixes: 338d4f49 ("arm64: kernel: Add support for Privileged Access Never") Signed-off-by: NPavel Tatashin <pasha.tatashin@soleen.com> [will: rewrote commit message] Signed-off-by: NWill Deacon <will@kernel.org>
-
由 Thomas Richter 提交于
Linux-next commit titled "perf/core: Optimize perf_init_event()" changed the semantics of PMU device driver registration. It was done to speed up the lookup/handling of PMU device driver specific events. It also enforces that only one PMU device driver will be registered of type PERF_EVENT_RAW. This change added these line in function perf_pmu_register(): ... + ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL); + if (ret < 0) goto free_pdc; + + WARN_ON(type >= 0 && ret != type); The warn_on generates a message. We have 3 PMU device drivers, each registered as type PERF_TYPE_RAW. The cf_diag device driver (arch/s390/kernel/perf_cpumf_cf_diag.c) always hits the WARN_ON because it is the second PMU device driver (after sampling device driver arch/s390/kernel/perf_cpumf_sf.c) which is registered as type 4 (PERF_TYPE_RAW). So when the sampling device driver is registered, ret has value 4. When cf_diag device driver is registered with type 4, ret has value of 5 and WARN_ON fires. Adjust the PMU device drivers for s390 to support the new semantics required by perf_pmu_register(). Signed-off-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 20 11月, 2019 6 次提交
-
-
由 Heiko Carstens 提交于
If an SMT capable system is not IPL'ed from the first CPU the setup of the physical to logical CPU mapping is broken: the IPL core gets CPU number 0, but then the next core gets CPU number 1. Correct would be that all SMT threads of CPU 0 get the subsequent logical CPU numbers. This is important since a lot of code (like e.g. the CPU topology code) assumes that CPU maps are setup like this. If the mapping is broken the system will not IPL due to broken topology masks: [ 1.716341] BUG: arch topology broken [ 1.716342] the SMT domain not a subset of the MC domain [ 1.716343] BUG: arch topology broken [ 1.716344] the MC domain not a subset of the BOOK domain This scenario can usually not happen since LPARs are always IPL'ed from CPU 0 and also re-IPL is intiated from CPU 0. However older kernels did initiate re-IPL on an arbitrary CPU. If therefore a re-IPL from an old kernel into a new kernel is initiated this may lead to crash. Fix this by setting up the physical to logical CPU mapping correctly. Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
vdso_per_cpu_data lowcore value is only needed for fully functional exception handlers, which are activated in setup_lowcore_dat_off. The same function does init vdso_per_cpu_data via vdso_alloc_boot_cpu. Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 Vasily Gorbik 提交于
Currently if the kernel is built with CONFIG_TRACE_IRQFLAGS and KASAN and used as crash kernel it crashes itself due to trace_hardirqs_off/trace_hardirqs_on being called with DAT off. This happens because trace_hardirqs_off/trace_hardirqs_on are instrumented and kasan code tries to perform access to shadow memory to validate memory accesses. Kasan shadow memory is populated with vmemmap, so all accesses require DAT on. memcpy_real could be called with DAT on or off (with kasan enabled DAT is set even before early code is executed). Make sure that trace_hardirqs_off/trace_hardirqs_on are called with DAT on and only actual __memcpy_real is called with DAT off. Also annotate __memcpy_real and _memcpy_real with __no_sanitize_address to avoid further problems due to switching DAT off. Reviewed-by: NPhilipp Rudo <prudo@linux.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
由 YueHaibing 提交于
s390_crypto_shash_parmsize() return type is int, it should not be stored in a unsigned variable, which compared with zero. Reported-by: NHulk Robot <hulkci@huawei.com> Fixes: 3c2eb6b7 ("s390/crypto: Support for SHA3 via CPACF (MSA6)") Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NJoerg Schmidbauer <jschmidb@linux.vnet.ibm.com> Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
-
- 14 11月, 2019 3 次提交
-
-
由 Sean Christopherson 提交于
Acquire the per-VM slots_lock when zapping all shadow pages as part of toggling nx_huge_pages. The fast zap algorithm relies on exclusivity (via slots_lock) to identify obsolete vs. valid shadow pages, because it uses a single bit for its generation number. Holding slots_lock also obviates the need to acquire a read lock on the VM's srcu. Failing to take slots_lock when toggling nx_huge_pages allows multiple instances of kvm_mmu_zap_all_fast() to run concurrently, as the other user, KVM_SET_USER_MEMORY_REGION, does not take the global kvm_lock. (kvm_mmu_zap_all_fast() does take kvm->mmu_lock, but it can be temporarily dropped by kvm_zap_obsolete_pages(), so it is not enough to enforce exclusivity). Concurrent fast zap instances causes obsolete shadow pages to be incorrectly identified as valid due to the single bit generation number wrapping, which results in stale shadow pages being left in KVM's MMU and leads to all sorts of undesirable behavior. The bug is easily confirmed by running with CONFIG_PROVE_LOCKING and toggling nx_huge_pages via its module param. Note, until commit 4ae5acbc4936 ("KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()", 2019-11-13) the fast zap algorithm used an ulong-sized generation instead of relying on exclusivity for correctness, but all callers except the recently added set_nx_huge_pages() needed to hold slots_lock anyways. Therefore, this patch does not have to be backported to stable kernels. Given that toggling nx_huge_pages is by no means a fast path, force it to conform to the current approach instead of reintroducing the previous generation count. Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation", but NOT FOR STABLE) Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Masahiro Yamada 提交于
Since commit 54b8ae66 ("kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)"), sparc allmodconfig fails to build as follows: CC arch/sparc/vdso/vdso32/vclock_gettime.o unrecognized e_machine 18 arch/sparc/vdso/vdso32/vclock_gettime.o arch/sparc/vdso/vdso32/vclock_gettime.o: failed The cause of the breakage is that -pg flag not being dropped. The vdso32 files are located in the vdso32/ subdirectory, but I missed to update the Makefile. I removed the meaningless CFLAGS_REMOVE_vdso-note.o since it is only effective for C file. vdso-note.o is compiled from assembly file: arch/sparc/vdso/vdso-note.S arch/sparc/vdso/vdso32/vdso-note.S Fixes: 54b8ae66 ("kbuild: change *FLAGS_<basetarget>.o to take the path relative to $(obj)") Reported-by: NAnatoly Pugachev <matorola@gmail.com> Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Tested-by: NAnatoly Pugachev <matorola@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net>
-
由 Anders Roxell 提交于
When building allmodconfig KCONFIG_ALLCONFIG=$(pwd)/arch/arm64/configs/defconfig CONFIG_CPU_BIG_ENDIAN gets enabled. Which tends not to be what most people want. Another concern that has come up is that ACPI isn't built for an allmodconfig kernel today since that also depends on !CPU_BIG_ENDIAN. Rework so that we introduce a 'choice' and default the choice to CPU_LITTLE_ENDIAN. That means that when we build an allmodconfig kernel it will default to CPU_LITTLE_ENDIAN that most people tends to want. Reviewed-by: NJohn Garry <john.garry@huawei.com> Acked-by: NWill Deacon <will@kernel.org> Signed-off-by: NAnders Roxell <anders.roxell@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-