1. 08 9月, 2020 23 次提交
  2. 04 9月, 2020 5 次提交
  3. 03 9月, 2020 2 次提交
    • J
      x86/mm/32: Bring back vmalloc faulting on x86_32 · 4819e15f
      Joerg Roedel 提交于
      One can not simply remove vmalloc faulting on x86-32. Upstream
      
      	commit: 7f0a002b ("x86/mm: remove vmalloc faulting")
      
      removed it on x86 alltogether because previously the
      arch_sync_kernel_mappings() interface was introduced. This interface
      added synchronization of vmalloc/ioremap page-table updates to all
      page-tables in the system at creation time and was thought to make
      vmalloc faulting obsolete.
      
      But that assumption was incredibly naive.
      
      It turned out that there is a race window between the time the vmalloc
      or ioremap code establishes a mapping and the time it synchronizes
      this change to other page-tables in the system.
      
      During this race window another CPU or thread can establish a vmalloc
      mapping which uses the same intermediate page-table entries (e.g. PMD
      or PUD) and does no synchronization in the end, because it found all
      necessary mappings already present in the kernel reference page-table.
      
      But when these intermediate page-table entries are not yet
      synchronized, the other CPU or thread will continue with a vmalloc
      address that is not yet mapped in the page-table it currently uses,
      causing an unhandled page fault and oops like below:
      
      	BUG: unable to handle page fault for address: fe80c000
      	#PF: supervisor write access in kernel mode
      	#PF: error_code(0x0002) - not-present page
      	*pde = 33183067 *pte = a8648163
      	Oops: 0002 [#1] SMP
      	CPU: 1 PID: 13514 Comm: cve-2017-17053 Tainted: G
      	...
      	Call Trace:
      	 ldt_dup_context+0x66/0x80
      	 dup_mm+0x2b3/0x480
      	 copy_process+0x133b/0x15c0
      	 _do_fork+0x94/0x3e0
      	 __ia32_sys_clone+0x67/0x80
      	 __do_fast_syscall_32+0x3f/0x70
      	 do_fast_syscall_32+0x29/0x60
      	 do_SYSENTER_32+0x15/0x20
      	 entry_SYSENTER_32+0x9f/0xf2
      	EIP: 0xb7eef549
      
      So the arch_sync_kernel_mappings() interface is racy, but removing it
      would mean to re-introduce the vmalloc_sync_all() interface, which is
      even more awful. Keep arch_sync_kernel_mappings() in place and catch
      the race condition in the page-fault handler instead.
      
      Do a partial revert of above commit to get vmalloc faulting on x86-32
      back in place.
      
      Fixes: 7f0a002b ("x86/mm: remove vmalloc faulting")
      Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200902155904.17544-1-joro@8bytes.org
      4819e15f
    • A
      x86/cmdline: Disable jump tables for cmdline.c · aef0148f
      Arvind Sankar 提交于
      When CONFIG_RETPOLINE is disabled, Clang uses a jump table for the
      switch statement in cmdline_find_option (jump tables are disabled when
      CONFIG_RETPOLINE is enabled). This function is called very early in boot
      from sme_enable() if CONFIG_AMD_MEM_ENCRYPT is enabled. At this time,
      the kernel is still executing out of the identity mapping, but the jump
      table will contain virtual addresses.
      
      Fix this by disabling jump tables for cmdline.c when AMD_MEM_ENCRYPT is
      enabled.
      Signed-off-by: NArvind Sankar <nivedita@alum.mit.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lore.kernel.org/r/20200903023056.3914690-1-nivedita@alum.mit.edu
      aef0148f
  4. 31 8月, 2020 1 次提交
  5. 30 8月, 2020 1 次提交
  6. 27 8月, 2020 2 次提交
    • T
      x86/irq: Unbreak interrupt affinity setting · e027ffff
      Thomas Gleixner 提交于
      Several people reported that 5.8 broke the interrupt affinity setting
      mechanism.
      
      The consolidation of the entry code reused the regular exception entry code
      for device interrupts and changed the way how the vector number is conveyed
      from ptregs->orig_ax to a function argument.
      
      The low level entry uses the hardware error code slot to push the vector
      number onto the stack which is retrieved from there into a function
      argument and the slot on stack is set to -1.
      
      The reason for setting it to -1 is that the error code slot is at the
      position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
      indicates that the entry came via a syscall. If it's not set to a negative
      value then a signal delivery on return to userspace would try to restart a
      syscall. But there are other places which rely on pt_regs::orig_ax being a
      valid indicator for syscall entry.
      
      But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
      interrupt affinity setting mechanism, which was overlooked when this change
      was made.
      
      Moving interrupts on x86 happens in several steps. A new vector on a
      different CPU is allocated and the relevant interrupt source is
      reprogrammed to that. But that's racy and there might be an interrupt
      already in flight to the old vector. So the old vector is preserved until
      the first interrupt arrives on the new vector and the new target CPU. Once
      that happens the old vector is cleaned up, but this cleanup still depends
      on the vector number being stored in pt_regs::orig_ax, which is now -1.
      
      That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
      always false. As a consequence the interrupt is moved once, but then it
      cannot be moved anymore because the cleanup of the old vector never
      happens.
      
      There would be several ways to convey the vector information to that place
      in the guts of the interrupt handling, but on deeper inspection it turned
      out that this check is pointless and a leftover from the old affinity model
      of X86 which supported multi-CPU affinities. Under this model it was
      possible that an interrupt had an old and a new vector on the same CPU, so
      the vector match was required.
      
      Under the new model the effective affinity of an interrupt is always a
      single CPU from the requested affinity mask. If the affinity mask changes
      then either the interrupt stays on the CPU and on the same vector when that
      CPU is still in the new affinity mask or it is moved to a different CPU, but
      it is never moved to a different vector on the same CPU.
      
      Ergo the cleanup check for the matching vector number is not required and
      can be removed which makes the dependency on pt_regs:orig_ax go away.
      
      The remaining check for new_cpu == smp_processsor_id() is completely
      sufficient. If it matches then the interrupt was successfully migrated and
      the cleanup can proceed.
      
      For paranoia sake add a warning into the vector assignment code to
      validate that the assumption of never moving to a different vector on
      the same CPU holds.
      
      Fixes: 633260fa ("x86/irq: Convey vector as argument and not in ptregs")
      Reported-by: NAlex bykov <alex.bykov@scylladb.com>
      Reported-by: NAvi Kivity <avi@scylladb.com>
      Reported-by: NAlexander Graf <graf@amazon.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NAlexander Graf <graf@amazon.com>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de
      e027ffff
    • A
      x86/hotplug: Silence APIC only after all interrupts are migrated · 52d6b926
      Ashok Raj 提交于
      There is a race when taking a CPU offline. Current code looks like this:
      
      native_cpu_disable()
      {
      	...
      	apic_soft_disable();
      	/*
      	 * Any existing set bits for pending interrupt to
      	 * this CPU are preserved and will be sent via IPI
      	 * to another CPU by fixup_irqs().
      	 */
      	cpu_disable_common();
      	{
      		....
      		/*
      		 * Race window happens here. Once local APIC has been
      		 * disabled any new interrupts from the device to
      		 * the old CPU are lost
      		 */
      		fixup_irqs(); // Too late to capture anything in IRR.
      		...
      	}
      }
      
      The fix is to disable the APIC *after* cpu_disable_common().
      
      Testing was done with a USB NIC that provided a source of frequent
      interrupts. A script migrated interrupts to a specific CPU and
      then took that CPU offline.
      
      Fixes: 60dcaad5 ("x86/hotplug: Silence APIC and NMI when CPU is dead")
      Reported-by: NEvan Green <evgreen@chromium.org>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMathias Nyman <mathias.nyman@linux.intel.com>
      Tested-by: NEvan Green <evgreen@chromium.org>
      Reviewed-by: NEvan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/lkml/875zdarr4h.fsf@nanos.tec.linutronix.de/
      Link: https://lore.kernel.org/r/1598501530-45821-1-git-send-email-ashok.raj@intel.com
      52d6b926
  7. 26 8月, 2020 3 次提交
  8. 24 8月, 2020 1 次提交
  9. 22 8月, 2020 1 次提交
    • W
      KVM: Pass MMU notifier range flags to kvm_unmap_hva_range() · fdfe7cbd
      Will Deacon 提交于
      The 'flags' field of 'struct mmu_notifier_range' is used to indicate
      whether invalidate_range_{start,end}() are permitted to block. In the
      case of kvm_mmu_notifier_invalidate_range_start(), this field is not
      forwarded on to the architecture-specific implementation of
      kvm_unmap_hva_range() and therefore the backend cannot sensibly decide
      whether or not to block.
      
      Add an extra 'flags' parameter to kvm_unmap_hva_range() so that
      architectures are aware as to whether or not they are permitted to block.
      
      Cc: <stable@vger.kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will@kernel.org>
      Message-Id: <20200811102725.7121-2-will@kernel.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fdfe7cbd
  10. 21 8月, 2020 1 次提交
    • S
      x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM · 6a3ea3e6
      Sean Christopherson 提交于
      KVM has an optmization to avoid expensive MRS read/writes on
      VMENTER/EXIT. It caches the MSR values and restores them either when
      leaving the run loop, on preemption or when going out to user space.
      
      The affected MSRs are not required for kernel context operations. This
      changed with the recently introduced mechanism to handle FSGSBASE in the
      paranoid entry code which has to retrieve the kernel GSBASE value by
      accessing per CPU memory. The mechanism needs to retrieve the CPU number
      and uses either LSL or RDPID if the processor supports it.
      
      Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
      lazily restored MSRs, which means between the point where the guest value
      is written and the point of restore, MSR_TSC_AUX contains a random number.
      
      If an NMI or any other exception which uses the paranoid entry path happens
      in such a context, then RDPID returns the random guest MSR_TSC_AUX value.
      
      As a consequence this reads from the wrong memory location to retrieve the
      kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
      operations. If the GSBASE in the exception handler points to the per CPU
      memory of a different CPU then this has the obvious consequences of data
      corruption and crashes.
      
      As the paranoid entry path is the only place which accesses MSR_TSX_AUX
      (via RDPID) and the fallback via LSL is not significantly slower, remove
      the RDPID alternative from the entry path and always use LSL.
      
      The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
      which would be inflicting massive overhead on that code path.
      
      [ tglx: Rewrote changelog ]
      
      Fixes: eaad9812 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
      Reported-by: NTom Lendacky <thomas.lendacky@amd.com>
      Debugged-by: NTom Lendacky <thomas.lendacky@amd.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com
      6a3ea3e6