1. 13 9月, 2021 1 次提交
    • T
      x86/extable: Rework the exception table mechanics · 46d28947
      Thomas Gleixner 提交于
      The exception table entries contain the instruction address, the fixup
      address and the handler address. All addresses are relative. Storing the
      handler address has a few downsides:
      
       1) Most handlers need to be exported
      
       2) Handlers can be defined everywhere and there is no overview about the
          handler types
      
       3) MCE needs to check the handler type to decide whether an in kernel #MC
          can be recovered. The functionality of the handler itself is not in any
          way special, but for these checks there need to be separate functions
          which in the worst case have to be exported.
      
          Some of these 'recoverable' exception fixups are pretty obscure and
          just reuse some other handler to spare code. That obfuscates e.g. the
          #MC safe copy functions. Cleaning that up would require more handlers
          and exports
      
      Rework the exception fixup mechanics by storing a fixup type number instead
      of the handler address and invoke the proper handler for each fixup
      type. Also teach the extable sort to leave the type field alone.
      
      This makes most handlers static except for special cases like the MCE
      MSR fixup and the BPF fixup. This allows to add more types for cleaning up
      the obscure places without adding more handler code and exports.
      
      There is a marginal code size reduction for a production config and it
      removes _eight_ exported symbols.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lkml.kernel.org/r/20210908132525.211958725@linutronix.de
      46d28947
  2. 24 6月, 2021 15 次提交
  3. 23 6月, 2021 6 次提交
  4. 22 6月, 2021 1 次提交
    • T
      x86/fpu: Make init_fpstate correct with optimized XSAVE · f9dfb5e3
      Thomas Gleixner 提交于
      The XSAVE init code initializes all enabled and supported components with
      XRSTOR(S) to init state. Then it XSAVEs the state of the components back
      into init_fpstate which is used in several places to fill in the init state
      of components.
      
      This works correctly with XSAVE, but not with XSAVEOPT and XSAVES because
      those use the init optimization and skip writing state of components which
      are in init state. So init_fpstate.xsave still contains all zeroes after
      this operation.
      
      There are two ways to solve that:
      
         1) Use XSAVE unconditionally, but that requires to reshuffle the buffer when
            XSAVES is enabled because XSAVES uses compacted format.
      
         2) Save the components which are known to have a non-zero init state by other
            means.
      
      Looking deeper, #2 is the right thing to do because all components the
      kernel supports have all-zeroes init state except the legacy features (FP,
      SSE). Those cannot be hard coded because the states are not identical on all
      CPUs, but they can be saved with FXSAVE which avoids all conditionals.
      
      Use FXSAVE to save the legacy FP/SSE components in init_fpstate along with
      a BUILD_BUG_ON() which reminds developers to validate that a newly added
      component has all zeroes init state. As a bonus remove the now unused
      copy_xregs_to_kernel_booting() crutch.
      
      The XSAVE and reshuffle method can still be implemented in the unlikely
      case that components are added which have a non-zero init state and no
      other means to save them. For now, FXSAVE is just simple and good enough.
      
        [ bp: Fix a typo or two in the text. ]
      
      Fixes: 6bad06b7 ("x86, xsave: Use xsaveopt in context-switch path when supported")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20210618143444.587311343@linutronix.de
      f9dfb5e3
  5. 09 6月, 2021 2 次提交
  6. 03 6月, 2021 1 次提交
    • T
      x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid() · 9bfecd05
      Thomas Gleixner 提交于
      While digesting the XSAVE-related horrors which got introduced with
      the supervisor/user split, the recent addition of ENQCMD-related
      functionality got on the radar and turned out to be similarly broken.
      
      update_pasid(), which is only required when X86_FEATURE_ENQCMD is
      available, is invoked from two places:
      
       1) From switch_to() for the incoming task
      
       2) Via a SMP function call from the IOMMU/SMV code
      
      #1 is half-ways correct as it hacks around the brokenness of get_xsave_addr()
         by enforcing the state to be 'present', but all the conditionals in that
         code are completely pointless for that.
      
         Also the invocation is just useless overhead because at that point
         it's guaranteed that TIF_NEED_FPU_LOAD is set on the incoming task
         and all of this can be handled at return to user space.
      
      #2 is broken beyond repair. The comment in the code claims that it is safe
         to invoke this in an IPI, but that's just wishful thinking.
      
         FPU state of a running task is protected by fregs_lock() which is
         nothing else than a local_bh_disable(). As BH-disabled regions run
         usually with interrupts enabled the IPI can hit a code section which
         modifies FPU state and there is absolutely no guarantee that any of the
         assumptions which are made for the IPI case is true.
      
         Also the IPI is sent to all CPUs in mm_cpumask(mm), but the IPI is
         invoked with a NULL pointer argument, so it can hit a completely
         unrelated task and unconditionally force an update for nothing.
         Worse, it can hit a kernel thread which operates on a user space
         address space and set a random PASID for it.
      
      The offending commit does not cleanly revert, but it's sufficient to
      force disable X86_FEATURE_ENQCMD and to remove the broken update_pasid()
      code to make this dysfunctional all over the place. Anything more
      complex would require more surgery and none of the related functions
      outside of the x86 core code are blatantly wrong, so removing those
      would be overkill.
      
      As nothing enables the PASID bit in the IA32_XSS MSR yet, which is
      required to make this actually work, this cannot result in a regression
      except for related out of tree train-wrecks, but they are broken already
      today.
      
      Fixes: 20f0afd1 ("x86/mmu: Allocate/free a PASID")
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/87mtsd6gr9.ffs@nanos.tec.linutronix.de
      9bfecd05
  7. 18 9月, 2020 1 次提交
  8. 08 9月, 2020 1 次提交
  9. 18 8月, 2020 1 次提交
  10. 08 7月, 2020 1 次提交
    • K
      x86/fpu: Use proper mask to replace full instruction mask · a063bf24
      Kan Liang 提交于
      When saving xstate to a kernel/user XSAVE area with the XSAVE family of
      instructions, the current code applies the 'full' instruction mask (-1),
      which tries to XSAVE all possible features. This method relies on
      hardware to trim 'all possible' down to what is enabled in the
      hardware. The code works well for now. However, there will be a
      problem, if some features are enabled in hardware, but are not suitable
      to be saved into all kernel XSAVE buffers, like task->fpu, due to
      performance consideration.
      
      One such example is the Last Branch Records (LBR) state. The LBR state
      only contains valuable information when LBR is explicitly enabled by
      the perf subsystem, and the size of an LBR state is large (808 bytes
      for now). To avoid both CPU overhead and space overhead at each context
      switch, the LBR state should not be saved into task->fpu like other
      state components. It should be saved/restored on demand when LBR is
      enabled in the perf subsystem. Current copy_xregs_to_* will trigger a
      buffer overflow for such cases.
      
      Three sites use the '-1' instruction mask which must be updated.
      
      Two are saving/restoring the xstate to/from a kernel-allocated XSAVE
      buffer and can use 'xfeatures_mask_all', which will save/restore all of
      the features present in a normal task FPU buffer.
      
      The last one saves the register state directly to a user buffer. It
      could
      also use 'xfeatures_mask_all'. Just as it was with the '-1' argument,
      any supervisor states in the mask will be filtered out by the hardware
      and not saved to the buffer.  But, to be more explicit about what is
      expected to be saved, use xfeatures_mask_user() for the instruction
      mask.
      
      KVM includes the header file fpu/internal.h. To avoid 'undefined
      xfeatures_mask_all' compiling issue, move copy_fpregs_to_fpstate() to
      fpu/core.c and export it, because:
      - The xfeatures_mask_all is indirectly used via copy_fpregs_to_fpstate()
        by KVM. The function which is directly used by other modules should be
        exported.
      - The copy_fpregs_to_fpstate() is a function, while xfeatures_mask_all
        is a variable for the "internal" FPU state. It's safer to export a
        function than a variable, which may be implicitly changed by others.
      - The copy_fpregs_to_fpstate() is a big function with many checks. The
        removal of the inline keyword should not impact the performance.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Link: https://lkml.kernel.org/r/1593780569-62993-20-git-send-email-kan.liang@linux.intel.com
      a063bf24
  11. 29 6月, 2020 1 次提交
  12. 26 6月, 2020 1 次提交
  13. 14 5月, 2020 1 次提交
  14. 13 5月, 2020 2 次提交
  15. 28 11月, 2019 1 次提交
    • S
      x86/fpu: Don't cache access to fpu_fpregs_owner_ctx · 59c4bd85
      Sebastian Andrzej Siewior 提交于
      The state/owner of the FPU is saved to fpu_fpregs_owner_ctx by pointing
      to the context that is currently loaded. It never changed during the
      lifetime of a task - it remained stable/constant.
      
      After deferred FPU registers loading until return to userland was
      implemented, the content of fpu_fpregs_owner_ctx may change during
      preemption and must not be cached.
      
      This went unnoticed for some time and was now noticed, in particular
      since gcc 9 is caching that load in copy_fpstate_to_sigframe() and
      reusing it in the retry loop:
      
        copy_fpstate_to_sigframe()
          load fpu_fpregs_owner_ctx and save on stack
          fpregs_lock()
          copy_fpregs_to_sigframe() /* failed */
          fpregs_unlock()
               *** PREEMPTION, another uses FPU, changes fpu_fpregs_owner_ctx ***
      
          fault_in_pages_writeable() /* succeed, retry */
      
          fpregs_lock()
      	__fpregs_load_activate()
      	  fpregs_state_valid() /* uses fpu_fpregs_owner_ctx from stack */
          copy_fpregs_to_sigframe() /* succeeds, random FPU content */
      
      This is a comparison of the assembly produced by gcc 9, without vs with this
      patch:
      
      | # arch/x86/kernel/fpu/signal.c:173:      if (!access_ok(buf, size))
      |        cmpq    %rdx, %rax      # tmp183, _4
      |        jb      .L190   #,
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-#APP
      |-# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |-       movq %gs:fpu_fpregs_owner_ctx,%rax      #, pfo_ret__
      |-# 0 "" 2
      |-#NO_APP
      |-       movq    %rax, -88(%rbp) # pfo_ret__, %sfp
      …
      |-# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read_stable(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |-       movq    -88(%rbp), %rcx # %sfp, pfo_ret__
      |-       cmpq    %rcx, -64(%rbp) # pfo_ret__, %sfp
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#APP
      |+# 512 "arch/x86/include/asm/fpu/internal.h" 1
      |+       movq %gs:fpu_fpregs_owner_ctx(%rip),%rax        # fpu_fpregs_owner_ctx, pfo_ret__
      |+# 0 "" 2
      |+# arch/x86/include/asm/fpu/internal.h:512:       return fpu == this_cpu_read(fpu_fpregs_owner_ctx) && cpu == fpu->last_cpu;
      |+#NO_APP
      |+       cmpq    %rax, -64(%rbp) # pfo_ret__, %sfp
      
      Use this_cpu_read() instead this_cpu_read_stable() to avoid caching of
      fpu_fpregs_owner_ctx during preemption points.
      
      The Fixes: tag points to the commit where deferred FPU loading was
      added. Since this commit, the compiler is no longer allowed to move the
      load of fpu_fpregs_owner_ctx somewhere else / outside of the locked
      section. A task preemption will change its value and stale content will
      be observed.
      
       [ bp: Massage. ]
      Debugged-by: NAustin Clements <austin@google.com>
      Debugged-by: NDavid Chase <drchase@golang.org>
      Debugged-by: NIan Lance Taylor <ian@airs.com>
      Fixes: 5f409e20 ("x86/fpu: Defer FPU state load until return to userspace")
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Austin Clements <austin@google.com>
      Cc: Barret Rhoden <brho@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Chase <drchase@golang.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: ian@airs.com
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Josh Bleecher Snyder <josharian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20191128085306.hxfa2o3knqtu4wfn@linutronix.de
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=205663
      59c4bd85
  16. 14 6月, 2019 1 次提交
  17. 13 4月, 2019 1 次提交
    • R
      x86/fpu: Defer FPU state load until return to userspace · 5f409e20
      Rik van Riel 提交于
      Defer loading of FPU state until return to userspace. This gives
      the kernel the potential to skip loading FPU state for tasks that
      stay in kernel mode, or for tasks that end up with repeated
      invocations of kernel_fpu_begin() & kernel_fpu_end().
      
      The fpregs_lock/unlock() section ensures that the registers remain
      unchanged. Otherwise a context switch or a bottom half could save the
      registers to its FPU context and the processor's FPU registers would
      became random if modified at the same time.
      
      KVM swaps the host/guest registers on entry/exit path. This flow has
      been kept as is. First it ensures that the registers are loaded and then
      saves the current (host) state before it loads the guest's registers. The
      swap is done at the very end with disabled interrupts so it should not
      change anymore before theg guest is entered. The read/save version seems
      to be cheaper compared to memcpy() in a micro benchmark.
      
      Each thread gets TIF_NEED_FPU_LOAD set as part of fork() / fpu__copy().
      For kernel threads, this flag gets never cleared which avoids saving /
      restoring the FPU state for kernel threads and during in-kernel usage of
      the FPU registers.
      
       [
         bp: Correct and update commit message and fix checkpatch warnings.
         s/register/registers/ where it is used in plural.
         minor comment corrections.
         remove unused trace_x86_fpu_activate_state() TP.
       ]
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: Babu Moger <Babu.Moger@amd.com>
      Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
      Cc: Dmitry Safonov <dima@arista.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Waiman Long <longman@redhat.com>
      Cc: x86-ml <x86@kernel.org>
      Cc: Yi Wang <wang.yi59@zte.com.cn>
      Link: https://lkml.kernel.org/r/20190403164156.19645-24-bigeasy@linutronix.de
      5f409e20
  18. 12 4月, 2019 1 次提交
    • S
      x86/fpu: Restore from kernel memory on the 64-bit path too · 926b21f3
      Sebastian Andrzej Siewior 提交于
      The 64-bit case (both 64-bit and 32-bit frames) loads the new state from
      user memory.
      
      However, doing this is not desired if the FPU state is going to be
      restored on return to userland: it would be required to disable
      preemption in order to avoid a context switch which would set
      TIF_NEED_FPU_LOAD. If this happens before the restore operation then the
      loaded registers would become volatile.
      
      Furthermore, disabling preemption while accessing user memory requires
      to disable the pagefault handler. An error during FXRSTOR would then
      mean that either a page fault occurred (and it would have to be retried
      with enabled page fault handler) or a #GP occurred because the xstate is
      bogus (after all, the signal handler can modify it).
      
      In order to avoid that mess, copy the FPU state from userland, validate
      it and then load it. The copy_kernel_…() helpers are basically just
      like the old helpers except that they operate on kernel memory and the
      fault handler just sets the error value and the caller handles it.
      
      copy_user_to_fpregs_zeroing() and its helpers remain and will be used
      later for a fastpath optimisation.
      
       [ bp: Clarify commit message. ]
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-22-bigeasy@linutronix.de
      926b21f3
  19. 11 4月, 2019 1 次提交
    • S
      x86/entry: Add TIF_NEED_FPU_LOAD · 383c2525
      Sebastian Andrzej Siewior 提交于
      Add TIF_NEED_FPU_LOAD. This flag is used for loading the FPU registers
      before returning to userland. It must not be set on systems without a
      FPU.
      
      If this flag is cleared, the CPU's FPU registers hold the latest,
      up-to-date content of the current task's (current()) FPU registers.
      The in-memory copy (union fpregs_state) is not valid.
      
      If this flag is set, then all of CPU's FPU registers may hold a random
      value (except for PKRU) and it is required to load the content of the
      FPU registers on return to userland.
      
      Introduce it now as a preparatory change before adding the main feature.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aubrey Li <aubrey.li@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: kvm ML <kvm@vger.kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190403164156.19645-17-bigeasy@linutronix.de
      383c2525