1. 05 2月, 2020 10 次提交
  2. 04 2月, 2020 6 次提交
  3. 01 2月, 2020 1 次提交
    • T
      x86/apic/msi: Plug non-maskable MSI affinity race · 6f1a4891
      Thomas Gleixner 提交于
      Evan tracked down a subtle race between the update of the MSI message and
      the device raising an interrupt internally on PCI devices which do not
      support MSI masking. The update of the MSI message is non-atomic and
      consists of either 2 or 3 sequential 32bit wide writes to the PCI config
      space.
      
         - Write address low 32bits
         - Write address high 32bits (If supported by device)
         - Write data
      
      When an interrupt is migrated then both address and data might change, so
      the kernel attempts to mask the MSI interrupt first. But for MSI masking is
      optional, so there exist devices which do not provide it. That means that
      if the device raises an interrupt internally between the writes then a MSI
      message is sent built from half updated state.
      
      On x86 this can lead to spurious interrupts on the wrong interrupt
      vector when the affinity setting changes both address and data. As a
      consequence the device interrupt can be lost causing the device to
      become stuck or malfunctioning.
      
      Evan tried to handle that by disabling MSI accross an MSI message
      update. That's not feasible because disabling MSI has issues on its own:
      
       If MSI is disabled the PCI device is routing an interrupt to the legacy
       INTx mechanism. The INTx delivery can be disabled, but the disablement is
       not working on all devices.
      
       Some devices lose interrupts when both MSI and INTx delivery are disabled.
      
      Another way to solve this would be to enforce the allocation of the same
      vector on all CPUs in the system for this kind of screwed devices. That
      could be done, but it would bring back the vector space exhaustion problems
      which got solved a few years ago.
      
      Fortunately the high address (if supported by the device) is only relevant
      when X2APIC is enabled which implies interrupt remapping. In the interrupt
      remapping case the affinity setting is happening at the interrupt remapping
      unit and the PCI MSI message is programmed only once when the PCI device is
      initialized.
      
      That makes it possible to solve it with a two step update:
      
        1) Target the MSI msg to the new vector on the current target CPU
      
        2) Target the MSI msg to the new vector on the new target CPU
      
      In both cases writing the MSI message is only changing a single 32bit word
      which prevents the issue of inconsistency.
      
      After writing the final destination it is necessary to check whether the
      device issued an interrupt while the intermediate state #1 (new vector,
      current CPU) was in effect.
      
      This is possible because the affinity change is always happening on the
      current target CPU. The code runs with interrupts disabled, so the
      interrupt can be detected by checking the IRR of the local APIC. If the
      vector is pending in the IRR then the interrupt is retriggered on the new
      target CPU by sending an IPI for the associated vector on the target CPU.
      
      This can cause spurious interrupts on both the local and the new target
      CPU.
      
       1) If the new vector is not in use on the local CPU and the device
          affected by the affinity change raised an interrupt during the
          transitional state (step #1 above) then interrupt entry code will
          ignore that spurious interrupt. The vector is marked so that the
          'No irq handler for vector' warning is supressed once.
      
       2) If the new vector is in use already on the local CPU then the IRR check
          might see an pending interrupt from the device which is using this
          vector. The IPI to the new target CPU will then invoke the handler of
          the device, which got the affinity change, even if that device did not
          issue an interrupt
      
       3) If the new vector is in use already on the local CPU and the device
          affected by the affinity change raised an interrupt during the
          transitional state (step #1 above) then the handler of the device which
          uses that vector on the local CPU will be invoked.
      
      expose issues in device driver interrupt handlers which are not prepared to
      handle a spurious interrupt correctly. This not a regression, it's just
      exposing something which was already broken as spurious interrupts can
      happen for a lot of reasons and all driver handlers need to be able to deal
      with them.
      Reported-by: NEvan Green <evgreen@chromium.org>
      Debugged-by: NEvan Green <evgreen@chromium.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NEvan Green <evgreen@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
      6f1a4891
  4. 31 1月, 2020 2 次提交
  5. 29 1月, 2020 1 次提交
  6. 28 1月, 2020 3 次提交
  7. 26 1月, 2020 2 次提交
  8. 25 1月, 2020 3 次提交
  9. 24 1月, 2020 5 次提交
    • S
      KVM: x86: Move kvm_vcpu_init() invocation to common code · 987b2594
      Sean Christopherson 提交于
      Move the kvm_cpu_{un}init() calls to common x86 code as an intermediate
      step to removing kvm_cpu_{un}init() altogether.
      
      Note, VMX'x alloc_apic_access_page() and init_rmode_identity_map() are
      per-VM allocations and are intentionally kept if vCPU creation fails.
      They are freed by kvm_arch_destroy_vm().
      
      No functional change intended.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      987b2594
    • S
      KVM: x86: Allocate vcpu struct in common x86 code · a9dd6f09
      Sean Christopherson 提交于
      Move allocation of VMX and SVM vcpus to common x86.  Although the struct
      being allocated is technically a VMX/SVM struct, it can be interpreted
      directly as a 'struct kvm_vcpu' because of the pre-existing requirement
      that 'struct kvm_vcpu' be located at offset zero of the arch/vendor vcpu
      struct.
      
      Remove the message from the build-time assertions regarding placement of
      the struct, as compatibility with the arch usercopy region is no longer
      the sole dependent on 'struct kvm_vcpu' being at offset zero.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a9dd6f09
    • D
      x86/asm: add iosubmit_cmds512() based on MOVDIR64B CPU instruction · 232bb01b
      Dave Jiang 提交于
      With the introduction of MOVDIR64B instruction, there is now an instruction
      that can write 64 bytes of data atomically.
      
      Quoting from Intel SDM:
      "There is no atomicity guarantee provided for the 64-byte load operation
      from source address, and processor implementations may use multiple
      load operations to read the 64-bytes. The 64-byte direct-store issued
      by MOVDIR64B guarantees 64-byte write-completion atomicity. This means
      that the data arrives at the destination in a single undivided 64-byte
      write transaction."
      
      We have identified at least 3 different use cases for this instruction in
      the format of func(dst, src, count):
      1) Clear poison / Initialize MKTME memory
         @dst is normal memory.
         @src in normal memory. Does not increment. (Copy same line to all
         targets)
         @count (to clear/init multiple lines)
      2) Submit command(s) to new devices
         @dst is a special MMIO region for a device. Does not increment.
         @src is normal memory. Increments.
         @count usually is 1, but can be multiple.
      3) Copy to iomem in big chunks
         @dst is iomem and increments
         @src in normal memory and increments
         @count is number of chunks to copy
      
      Add support for case #2 to support device that will accept commands via
      this instruction. We provide a @count in order to submit a batch of
      preprogrammed descriptors in virtually contiguous memory. This
      allows the caller to submit multiple descriptors to a device with a single
      submission. The special device requires the entire 64bytes descriptor to
      be written atomically and will accept MOVDIR64B instruction.
      Signed-off-by: NDave Jiang <dave.jiang@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/157965022175.73301.10174614665472962675.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NVinod Koul <vkoul@kernel.org>
      232bb01b
    • D
      x86/mpx: remove MPX from arch/x86 · 45fc24e8
      Dave Hansen 提交于
      From: Dave Hansen <dave.hansen@linux.intel.com>
      
      MPX is being removed from the kernel due to a lack of support
      in the toolchain going forward (gcc).
      
      This removes all the remaining (dead at this point) MPX handling
      code remaining in the tree.  The only remaining code is the XSAVE
      support for MPX state which is currently needd for KVM to handle
      VMs which might use MPX.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      45fc24e8
    • D
      mm: remove arch_bprm_mm_init() hook · 42222eae
      Dave Hansen 提交于
      From: Dave Hansen <dave.hansen@linux.intel.com>
      
      MPX is being removed from the kernel due to a lack of support
      in the toolchain going forward (gcc).
      
      arch_bprm_mm_init() is used at execve() time.  The only non-stub
      implementation is on x86 for MPX.  Remove the hook entirely from
      all architectures and generic code.
      
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: x86@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-arch@vger.kernel.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      42222eae
  10. 23 1月, 2020 7 次提交