1. 23 6月, 2021 3 次提交
  2. 22 6月, 2021 1 次提交
    • T
      x86/fpu: Make init_fpstate correct with optimized XSAVE · f9dfb5e3
      Thomas Gleixner 提交于
      The XSAVE init code initializes all enabled and supported components with
      XRSTOR(S) to init state. Then it XSAVEs the state of the components back
      into init_fpstate which is used in several places to fill in the init state
      of components.
      
      This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
      those use the init optimization and skip writing state of components which
      are in init state. So init_fpstate.xsave still contains all zeroes after
      this operation.
      
      There are two ways to solve that:
      
         1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
            XSAVES is enabled because XSAVES uses compacted format.
      
         2) Save the components which are known to have a non-zero init state by other
            means.
      
      Looking deeper, #2 is the right thing to do because all components the
      kernel supports have all-zeroes init state except the legacy features (FP,
      SSE). Those cannot be hard coded because the states are not identical on all
      CPUs, but they can be saved with FXSAVE which avoids all conditionals.
      
      Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
      a BUILD_BUG_ON() which reminds developers to validate that a newly added
      component has all zeroes init state. As a bonus remove the now unused
      copy_xregs_to_kernel_booting() crutch.
      
      The XSAVE and reshuffle method can still be implemented in the unlikely
      case that components are added which have a non-zero init state and no
      other means to save them. For now, FXSAVE is just simple and good enough.
      
        [ bp: Fix a typo or two in the text. ]
      
      Fixes: 6bad06b7 ("x86, xsave: Use xsaveopt in context-switch path when supported")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
      f9dfb5e3
  3. 09 6月, 2021 2 次提交
  4. 03 6月, 2021 1 次提交
    • T
      x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() · 9bfecd05
      Thomas Gleixner 提交于
      While digesting the XSAVE-related horrors which got introduced with
      the supervisor/user split, the recent addition of ENQCMD-related
      functionality got on the radar and turned out to be similarly broken.
      
      update_pasid(), which is only required when X86_FEATURE_ENQCMD is
      available, is invoked from two places:
      
       1) From switch_to() for the incoming task
      
       2) Via a SMP function call from the IOMMU/SMV code
      
      #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
         by enforcing the state to be 'present', but all the conditionals in that
         code are completely pointless for that.
      
         Also the invocation is just useless overhead because at that point
         it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
         and all of this can be handled at return to user space.
      
      #2 is broken beyond repair. The comment in the code claims that it is safe
         to invoke this in an IPI, but that's just wishful thinking.
      
         FPU state of a running task is protected by fregs_lock() which is
         nothing else than a local_bh_disable(). As BH-disabled regions run
         usually with interrupts enabled the IPI can hit a code section which
         modifies FPU state and there is absolutely no guarantee that any of the
         assumptions which are made for the IPI case is true.
      
         Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
         invoked with a NULL pointer argument, so it can hit a completely
         unrelated task and unconditionally force an update for nothing.
         Worse, it can hit a kernel thread which operates on a user space
         address space and set a random PASID for it.
      
      The offending commit does not cleanly revert, but it's sufficient to
      force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
      code to make this dysfunctional all over the place. Anything more
      complex would require more surgery and none of the related functions
      outside of the x86 core code are blatantly wrong, so removing those
      would be overkill.
      
      As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
      required to make this actually work, this cannot result in a regression
      except for related out of tree train-wrecks, but they are broken already
      today.
      
      Fixes: 20f0afd1 ("x86/mmu: Allocate/free a PASID")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
      9bfecd05
  5. 01 6月, 2021 1 次提交
    • B
      x86/thermal: Fix LVT thermal setup for SMI delivery mode · 9a90ed06
      Borislav Petkov 提交于
      There are machines out there with added value crap^WBIOS which provide an
      SMI handler for the local APIC thermal sensor interrupt. Out of reset,
      the BSP on those machines has something like 0x200 in that APIC register
      (timestamps left in because this whole issue is timing sensitive):
      
        [    0.033858] read lvtthmr: 0x330, val: 0x200
      
      which means:
      
       - bit 16 - the interrupt mask bit is clear and thus that interrupt is enabled
       - bits [10:8] have 010b which means SMI delivery mode.
      
      Now, later during boot, when the kernel programs the local APIC, it
      soft-disables it temporarily through the spurious vector register:
      
        setup_local_APIC:
      
        	...
      
      	/*
      	 * If this comes from kexec/kcrash the APIC might be enabled in
      	 * SPIV. Soft disable it before doing further initialization.
      	 */
      	value = apic_read(APIC_SPIV);
      	value &= ~APIC_SPIV_APIC_ENABLED;
      	apic_write(APIC_SPIV, value);
      
      which means (from the SDM):
      
      "10.4.7.2 Local APIC State After It Has Been Software Disabled
      
      ...
      
      * The mask bits for all the LVT entries are set. Attempts to reset these
      bits will be ignored."
      
      And this happens too:
      
        [    0.124111] APIC: Switch to symmetric I/O mode setup
        [    0.124117] lvtthmr 0x200 before write 0xf to APIC 0xf0
        [    0.124118] lvtthmr 0x10200 after write 0xf to APIC 0xf0
      
      This results in CPU 0 soft lockups depending on the placement in time
      when the APIC soft-disable happens. Those soft lockups are not 100%
      reproducible and the reason for that can only be speculated as no one
      tells you what SMM does. Likely, it confuses the SMM code that the APIC
      is disabled and the thermal interrupt doesn't doesn't fire at all,
      leading to CPU 0 stuck in SMM forever...
      
      Now, before
      
        4f432e8b ("x86/mce: Get rid of mcheck_intel_therm_init()")
      
      due to how the APIC_LVTTHMR was read before APIC initialization in
      mcheck_intel_therm_init(), it would read the value with the mask bit 16
      clear and then intel_init_thermal() would replicate it onto the APs and
      all would be peachy - the thermal interrupt would remain enabled.
      
      But that commit moved that reading to a later moment in
      intel_init_thermal(), resulting in reading APIC_LVTTHMR on the BSP too
      late and with its interrupt mask bit set.
      
      Thus, revert back to the old behavior of reading the thermal LVT
      register before the APIC gets initialized.
      
      Fixes: 4f432e8b ("x86/mce: Get rid of mcheck_intel_therm_init()")
      Reported-by: NJames Feeney <james@nurealm.net>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: <stable@vger.kernel.org>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Link: https://lkml.kernel.org/r/YKIqDdFNaXYd39wz@zn.tnic
      9a90ed06
  6. 29 5月, 2021 1 次提交
    • T
      x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing · 7d65f9e8
      Thomas Gleixner 提交于
      PIC interrupts do not support affinity setting and they can end up on
      any online CPU. Therefore, it's required to mark the associated vectors
      as system-wide reserved. Otherwise, the corresponding irq descriptors
      are copied to the secondary CPUs but the vectors are not marked as
      assigned or reserved. This works correctly for the IO/APIC case.
      
      When the IO/APIC is disabled via config, kernel command line or lack of
      enumeration then all legacy interrupts are routed through the PIC, but
      nothing marks them as system-wide reserved vectors.
      
      As a consequence, a subsequent allocation on a secondary CPU can result in
      allocating one of these vectors, which triggers the BUG() in
      apic_update_vector() because the interrupt descriptor slot is not empty.
      
      Imran tried to work around that by marking those interrupts as allocated
      when a CPU comes online. But that's wrong in case that the IO/APIC is
      available and one of the legacy interrupts, e.g. IRQ0, has been switched to
      PIC mode because then marking them as allocated will fail as they are
      already marked as system vectors.
      
      Stay consistent and update the legacy vectors after attempting IO/APIC
      initialization and mark them as system vectors in case that no IO/APIC is
      available.
      
      Fixes: 69cde000 ("x86/vector: Use matrix allocator for vector assignment")
      Reported-by: NImran Khan <imran.f.khan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210519233928.2157496-1-imran.f.khan@oracle.com
      7d65f9e8
  7. 27 5月, 2021 1 次提交
  8. 19 5月, 2021 2 次提交
  9. 14 5月, 2021 1 次提交
  10. 13 5月, 2021 1 次提交
  11. 10 5月, 2021 3 次提交
  12. 07 5月, 2021 9 次提交
  13. 06 5月, 2021 3 次提交
  14. 05 5月, 2021 1 次提交
  15. 03 5月, 2021 1 次提交
    • M
      KVM: nSVM: fix few bugs in the vmcb02 caching logic · c74ad08f
      Maxim Levitsky 提交于
      * Define and use an invalid GPA (all ones) for init value of last
        and current nested vmcb physical addresses.
      
      * Reset the current vmcb12 gpa to the invalid value when leaving
        the nested mode, similar to what is done on nested vmexit.
      
      * Reset	the last seen vmcb12 address when disabling the nested SVM,
        as it relies on vmcb02 fields which are freed at that point.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c74ad08f
  16. 01 5月, 2021 3 次提交
  17. 26 4月, 2021 1 次提交
  18. 22 4月, 2021 1 次提交
    • N
      KVM: x86: Support KVM VMs sharing SEV context · 54526d1f
      Nathan Tempelman 提交于
      Add a capability for userspace to mirror SEV encryption context from
      one vm to another. On our side, this is intended to support a
      Migration Helper vCPU, but it can also be used generically to support
      other in-guest workloads scheduled by the host. The intention is for
      the primary guest and the mirror to have nearly identical memslots.
      
      The primary benefits of this are that:
      1) The VMs do not share KVM contexts (think APIC/MSRs/etc), so they
      can't accidentally clobber each other.
      2) The VMs can have different memory-views, which is necessary for post-copy
      migration (the migration vCPUs on the target need to read and write to
      pages, when the primary guest would VMEXIT).
      
      This does not change the threat model for AMD SEV. Any memory involved
      is still owned by the primary guest and its initial state is still
      attested to through the normal SEV_LAUNCH_* flows. If userspace wanted
      to circumvent SEV, they could achieve the same effect by simply attaching
      a vCPU to the primary VM.
      This patch deliberately leaves userspace in charge of the memslots for the
      mirror, as it already has the power to mess with them in the primary guest.
      
      This patch does not support SEV-ES (much less SNP), as it does not
      handle handing off attested VMSAs to the mirror.
      
      For additional context, we need a Migration Helper because SEV PSP
      migration is far too slow for our live migration on its own. Using
      an in-guest migrator lets us speed this up significantly.
      Signed-off-by: NNathan Tempelman <natet@google.com>
      Message-Id: <20210408223214.2582277-1-natet@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      54526d1f
  19. 21 4月, 2021 2 次提交
  20. 20 4月, 2021 2 次提交