1. 13 11月, 2019 1 次提交
    • P
      x86/speculation/taa: Add mitigation for TSX Async Abort · 6c58ea85
      Pawan Gupta 提交于
      commit 1b42f017415b46c317e71d41c34ec088417a1883 upstream.
      
      TSX Async Abort (TAA) is a side channel vulnerability to the internal
      buffers in some Intel processors similar to Microachitectural Data
      Sampling (MDS). In this case, certain loads may speculatively pass
      invalid data to dependent operations when an asynchronous abort
      condition is pending in a TSX transaction.
      
      This includes loads with no fault or assist condition. Such loads may
      speculatively expose stale data from the uarch data structures as in
      MDS. Scope of exposure is within the same-thread and cross-thread. This
      issue affects all current processors that support TSX, but do not have
      ARCH_CAP_TAA_NO (bit 8) set in MSR_IA32_ARCH_CAPABILITIES.
      
      On CPUs which have their IA32_ARCH_CAPABILITIES MSR bit MDS_NO=0,
      CPUID.MD_CLEAR=1 and the MDS mitigation is clearing the CPU buffers
      using VERW or L1D_FLUSH, there is no additional mitigation needed for
      TAA. On affected CPUs with MDS_NO=1 this issue can be mitigated by
      disabling the Transactional Synchronization Extensions (TSX) feature.
      
      A new MSR IA32_TSX_CTRL in future and current processors after a
      microcode update can be used to control the TSX feature. There are two
      bits in that MSR:
      
      * TSX_CTRL_RTM_DISABLE disables the TSX sub-feature Restricted
      Transactional Memory (RTM).
      
      * TSX_CTRL_CPUID_CLEAR clears the RTM enumeration in CPUID. The other
      TSX sub-feature, Hardware Lock Elision (HLE), is unconditionally
      disabled with updated microcode but still enumerated as present by
      CPUID(EAX=7).EBX{bit4}.
      
      The second mitigation approach is similar to MDS which is clearing the
      affected CPU buffers on return to user space and when entering a guest.
      Relevant microcode update is required for the mitigation to work.  More
      details on this approach can be found here:
      
        https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/mds.html
      
      The TSX feature can be controlled by the "tsx" command line parameter.
      If it is force-enabled then "Clear CPU buffers" (MDS mitigation) is
      deployed. The effective mitigation state can be read from sysfs.
      
       [ bp:
         - massage + comments cleanup
         - s/TAA_MITIGATION_TSX_DISABLE/TAA_MITIGATION_TSX_DISABLED/g - Josh.
         - remove partial TAA mitigation in update_mds_branch_idle() - Josh.
         - s/tsx_async_abort_cmdline/tsx_async_abort_parse_cmdline/g
       ]
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6c58ea85
  2. 29 8月, 2019 1 次提交
    • S
      x86/retpoline: Don't clobber RFLAGS during CALL_NOSPEC on i386 · f9747104
      Sean Christopherson 提交于
      commit b63f20a778c88b6a04458ed6ffc69da953d3a109 upstream.
      
      Use 'lea' instead of 'add' when adjusting %rsp in CALL_NOSPEC so as to
      avoid clobbering flags.
      
      KVM's emulator makes indirect calls into a jump table of sorts, where
      the destination of the CALL_NOSPEC is a small blob of code that performs
      fast emulation by executing the target instruction with fixed operands.
      
        adcb_al_dl:
           0x000339f8 <+0>:   adc    %dl,%al
           0x000339fa <+2>:   ret
      
      A major motiviation for doing fast emulation is to leverage the CPU to
      handle consumption and manipulation of arithmetic flags, i.e. RFLAGS is
      both an input and output to the target of CALL_NOSPEC.  Clobbering flags
      results in all sorts of incorrect emulation, e.g. Jcc instructions often
      take the wrong path.  Sans the nops...
      
        asm("push %[flags]; popf; " CALL_NOSPEC " ; pushf; pop %[flags]\n"
           0x0003595a <+58>:  mov    0xc0(%ebx),%eax
           0x00035960 <+64>:  mov    0x60(%ebx),%edx
           0x00035963 <+67>:  mov    0x90(%ebx),%ecx
           0x00035969 <+73>:  push   %edi
           0x0003596a <+74>:  popf
           0x0003596b <+75>:  call   *%esi
           0x000359a0 <+128>: pushf
           0x000359a1 <+129>: pop    %edi
           0x000359a2 <+130>: mov    %eax,0xc0(%ebx)
           0x000359b1 <+145>: mov    %edx,0x60(%ebx)
      
        ctxt->eflags = (ctxt->eflags & ~EFLAGS_MASK) | (flags & EFLAGS_MASK);
           0x000359a8 <+136>: mov    -0x10(%ebp),%eax
           0x000359ab <+139>: and    $0x8d5,%edi
           0x000359b4 <+148>: and    $0xfffff72a,%eax
           0x000359b9 <+153>: or     %eax,%edi
           0x000359bd <+157>: mov    %edi,0x4(%ebx)
      
      For the most part this has gone unnoticed as emulation of guest code
      that can trigger fast emulation is effectively limited to MMIO when
      running on modern hardware, and MMIO is rarely, if ever, accessed by
      instructions that affect or consume flags.
      
      Breakage is almost instantaneous when running with unrestricted guest
      disabled, in which case KVM must emulate all instructions when the guest
      has invalid state, e.g. when the guest is in Big Real Mode during early
      BIOS.
      
      Fixes: 776b043848fd2 ("x86/retpoline: Add initial retpoline support")
      Fixes: 1a29b5b7 ("KVM: x86: Make indirect calls in emulator speculation safe")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20190822211122.27579-1-sean.j.christopherson@intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9747104
  3. 15 5月, 2019 3 次提交
    • T
      x86/speculation/mds: Conditionally clear CPU buffers on idle entry · 4df98b3f
      Thomas Gleixner 提交于
      commit 07f07f55a29cb705e221eda7894dd67ab81ef343 upstream
      
      Add a static key which controls the invocation of the CPU buffer clear
      mechanism on idle entry. This is independent of other MDS mitigations
      because the idle entry invocation to mitigate the potential leakage due to
      store buffer repartitioning is only necessary on SMT systems.
      
      Add the actual invocations to the different halt/mwait variants which
      covers all usage sites. mwaitx is not patched as it's not available on
      Intel CPUs.
      
      The buffer clear is only invoked before entering the C-State to prevent
      that stale data from the idling CPU is spilled to the Hyper-Thread sibling
      after the Store buffer got repartitioned and all entries are available to
      the non idle sibling.
      
      When coming out of idle the store buffer is partitioned again so each
      sibling has half of it available. Now CPU which returned from idle could be
      speculatively exposed to contents of the sibling, but the buffers are
      flushed either on exit to user space or on VMENTER.
      
      When later on conditional buffer clearing is implemented on top of this,
      then there is no action required either because before returning to user
      space the context switch will set the condition flag which causes a flush
      on the return to user path.
      
      Note, that the buffer clearing on idle is only sensible on CPUs which are
      solely affected by MSBDS and not any other variant of MDS because the other
      MDS variants cannot be mitigated when SMT is enabled, so the buffer
      clearing on idle would be a window dressing exercise.
      
      This intentionally does not handle the case in the acpi/processor_idle
      driver which uses the legacy IO port interface for C-State transitions for
      two reasons:
      
       - The acpi/processor_idle driver was replaced by the intel_idle driver
         almost a decade ago. Anything Nehalem upwards supports it and defaults
         to that new driver.
      
       - The legacy IO port interface is likely to be used on older and therefore
         unaffected CPUs or on systems which do not receive microcode updates
         anymore, so there is no point in adding that.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4df98b3f
    • T
      x86/speculation/mds: Clear CPU buffers on exit to user · e4fa775b
      Thomas Gleixner 提交于
      commit 04dcbdb8057827b043b3c71aa397c4c63e67d086 upstream
      
      Add a static key which controls the invocation of the CPU buffer clear
      mechanism on exit to user space and add the call into
      prepare_exit_to_usermode() and do_nmi() right before actually returning.
      
      Add documentation which kernel to user space transition this covers and
      explain why some corner cases are not mitigated.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4fa775b
    • T
      x86/speculation/mds: Add mds_clear_cpu_buffers() · 1f7c31be
      Thomas Gleixner 提交于
      commit 6a9e529272517755904b7afa639f6db59ddb793e upstream
      
      The Microarchitectural Data Sampling (MDS) vulernabilities are mitigated by
      clearing the affected CPU buffers. The mechanism for clearing the buffers
      uses the unused and obsolete VERW instruction in combination with a
      microcode update which triggers a CPU buffer clear when VERW is executed.
      
      Provide a inline function with the assembly magic. The argument of the VERW
      instruction must be a memory operand as documented:
      
        "MD_CLEAR enumerates that the memory-operand variant of VERW (for
         example, VERW m16) has been extended to also overwrite buffers affected
         by MDS. This buffer overwriting functionality is not guaranteed for the
         register operand variant of VERW."
      
      Documentation also recommends to use a writable data segment selector:
      
        "The buffer overwriting occurs regardless of the result of the VERW
         permission check, as well as when the selector is null or causes a
         descriptor load segment violation. However, for lowest latency we
         recommend using a selector that indicates a valid writable data
         segment."
      
      Add x86 specific documentation about MDS and the internal workings of the
      mitigation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>
      Reviewed-by: NJon Masters <jcm@redhat.com>
      Tested-by: NJon Masters <jcm@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1f7c31be
  4. 06 12月, 2018 7 次提交
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354
      Thomas Gleixner 提交于
      commit 6b3e64c2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ec2354
    • T
      x86/speculation: Add prctl() control for indirect branch speculation · 238ba6e7
      Thomas Gleixner 提交于
      commit 9137bb27 upstream
      
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      238ba6e7
    • T
      x86/speculation: Prepare for conditional IBPB in switch_mm() · a1788815
      Thomas Gleixner 提交于
      commit 4c71a2b6 upstream
      
      The IBPB speculation barrier is issued from switch_mm() when the kernel
      switches to a user space task with a different mm than the user space task
      which ran last on the same CPU.
      
      An additional optimization is to avoid IBPB when the incoming task can be
      ptraced by the outgoing task. This optimization only works when switching
      directly between two user space tasks. When switching from a kernel task to
      a user space task the optimization fails because the previous task cannot
      be accessed anymore. So for quite some scenarios the optimization is just
      adding overhead.
      
      The upcoming conditional IBPB support will issue IBPB only for user space
      tasks which have the TIF_SPEC_IB bit set. This requires to handle the
      following cases:
      
        1) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB set to a user space task (potential victim) which has
           TIF_SPEC_IB not set.
      
        2) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB not set to a user space task (potential victim) which has
           TIF_SPEC_IB set.
      
      This needs to be optimized for the case where the IBPB can be avoided when
      only kernel threads ran in between user space tasks which belong to the
      same process.
      
      The current check whether two tasks belong to the same context is using the
      tasks context id. While correct, it's simpler to use the mm pointer because
      it allows to mangle the TIF_SPEC_IB bit into it. The context id based
      mechanism requires extra storage, which creates worse code.
      
      When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
      the per CPU storage which is used to track the last user space mm which was
      running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
      the incoming task to make the decision whether IBPB needs to be issued or
      not to cover the two cases above.
      
      As conditional IBPB is going to be the default, remove the dubious ptrace
      check for the IBPB always case and simply issue IBPB always when the
      process changes.
      
      Move the storage to a different place in the struct as the original one
      created a hole.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1788815
    • T
      x86/speculation: Add command line control for indirect branch speculation · 71187543
      Thomas Gleixner 提交于
      commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream
      
      Add command line control for user space indirect branch speculation
      mitigations. The new option is: spectre_v2_user=
      
      The initial options are:
      
          -  on:   Unconditionally enabled
          - off:   Unconditionally disabled
          -auto:   Kernel selects mitigation (default off for now)
      
      When the spectre_v2= command line argument is either 'on' or 'off' this
      implies that the application to application control follows that state even
      if a contradicting spectre_v2_user= argument is supplied.
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71187543
    • Z
      x86/retpoline: Remove minimal retpoline support · 90d2c53f
      Zhenzhong Duan 提交于
      commit ef014aae8f1cd2793e4e014bbb102bed53f852b7 upstream
      
      Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
      reason to keep the minimal retpoline support around which only provided
      basic protection in the assembly files.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <srinivas.eeda@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d2c53f
    • Z
      x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support · 8c4ad5d3
      Zhenzhong Duan 提交于
      commit 4cd24de3a0980bf3100c9dcb08ef65ca7c31af48 upstream
      
      Since retpoline capable compilers are widely available, make
      CONFIG_RETPOLINE hard depend on the compiler capability.
      
      Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
      support it. Emit an error message in that case:
      
       "arch/x86/Makefile:226: *** You are building kernel with non-retpoline
        compiler, please update your compiler..  Stop."
      
      [dwmw: Fail the build with non-retpoline compiler]
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: <srinivas.eeda@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c4ad5d3
    • Z
      x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant · cbc93677
      Zhenzhong Duan 提交于
      commit 0cbb76d6285794f30953bfa3ab831714b59dd700 upstream
      
      ..so that they match their asm counterpart.
      
      Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang YanQing <udknight@gmail.com>
      Cc: dhaval.giani@oracle.com
      Cc: srinivas.eeda@oracle.com
      Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbc93677
  5. 03 8月, 2018 1 次提交
    • S
      x86/speculation: Support Enhanced IBRS on future CPUs · 706d5168
      Sai Praneeth 提交于
      Future Intel processors will support "Enhanced IBRS" which is an "always
      on" mode i.e. IBRS bit in SPEC_CTRL MSR is enabled once and never
      disabled.
      
      From the specification [1]:
      
       "With enhanced IBRS, the predicted targets of indirect branches
        executed cannot be controlled by software that was executed in a less
        privileged predictor mode or on another logical processor. As a
        result, software operating on a processor with enhanced IBRS need not
        use WRMSR to set IA32_SPEC_CTRL.IBRS after every transition to a more
        privileged predictor mode. Software can isolate predictor modes
        effectively simply by setting the bit once. Software need not disable
        enhanced IBRS prior to entering a sleep state such as MWAIT or HLT."
      
      If Enhanced IBRS is supported by the processor then use it as the
      preferred spectre v2 mitigation mechanism instead of Retpoline. Intel's
      Retpoline white paper [2] states:
      
       "Retpoline is known to be an effective branch target injection (Spectre
        variant 2) mitigation on Intel processors belonging to family 6
        (enumerated by the CPUID instruction) that do not have support for
        enhanced IBRS. On processors that support enhanced IBRS, it should be
        used for mitigation instead of retpoline."
      
      The reason why Enhanced IBRS is the recommended mitigation on processors
      which support it is that these processors also support CET which
      provides a defense against ROP attacks. Retpoline is very similar to ROP
      techniques and might trigger false positives in the CET defense.
      
      If Enhanced IBRS is selected as the mitigation technique for spectre v2,
      the IBRS bit in SPEC_CTRL MSR is set once at boot time and never
      cleared. Kernel also has to make sure that IBRS bit remains set after
      VMEXIT because the guest might have cleared the bit. This is already
      covered by the existing x86_spec_ctrl_set_guest() and
      x86_spec_ctrl_restore_host() speculation control functions.
      
      Enhanced IBRS still requires IBPB for full mitigation.
      
      [1] Speculative-Execution-Side-Channel-Mitigations.pdf
      [2] Retpoline-A-Branch-Target-Injection-Mitigation.pdf
      Both documents are available at:
      https://bugzilla.kernel.org/show_bug.cgi?id=199511Originally-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tim C Chen <tim.c.chen@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Link: https://lkml.kernel.org/r/1533148945-24095-1-git-send-email-sai.praneeth.prakhya@intel.com
      706d5168
  6. 19 7月, 2018 1 次提交
  7. 17 5月, 2018 2 次提交
  8. 15 5月, 2018 1 次提交
  9. 14 5月, 2018 1 次提交
  10. 05 5月, 2018 1 次提交
  11. 04 5月, 2018 1 次提交
    • W
      bpf, x86_32: add eBPF JIT compiler for ia32 · 03f5781b
      Wang YanQing 提交于
      The JIT compiler emits ia32 bit instructions. Currently, It supports eBPF
      only. Classic BPF is supported because of the conversion by BPF core.
      
      Almost all instructions from eBPF ISA supported except the following:
      BPF_ALU64 | BPF_DIV | BPF_K
      BPF_ALU64 | BPF_DIV | BPF_X
      BPF_ALU64 | BPF_MOD | BPF_K
      BPF_ALU64 | BPF_MOD | BPF_X
      BPF_STX | BPF_XADD | BPF_W
      BPF_STX | BPF_XADD | BPF_DW
      
      It doesn't support BPF_JMP|BPF_CALL with BPF_PSEUDO_CALL at the moment.
      
      IA32 has few general purpose registers, EAX|EDX|ECX|EBX|ESI|EDI. I use
      EAX|EDX|ECX|EBX as temporary registers to simulate instructions in eBPF
      ISA, and allocate ESI|EDI to BPF_REG_AX for constant blinding, all others
      eBPF registers, R0-R10, are simulated through scratch space on stack.
      
      The reasons behind the hardware registers allocation policy are:
      1:MUL need EAX:EDX, shift operation need ECX, so they aren't fit
        for general eBPF 64bit register simulation.
      2:We need at least 4 registers to simulate most eBPF ISA operations
        on registers operands instead of on register&memory operands.
      3:We need to put BPF_REG_AX on hardware registers, or constant blinding
        will degrade jit performance heavily.
      
      Tested on PC (Intel(R) Core(TM) i5-5200U CPU).
      Testing results on i5-5200U:
      1) test_bpf: Summary: 349 PASSED, 0 FAILED, [319/341 JIT'ed]
      2) test_progs: Summary: 83 PASSED, 0 FAILED.
      3) test_lpm: OK
      4) test_lru_map: OK
      5) test_verifier: Summary: 828 PASSED, 0 FAILED.
      
      Above tests are all done in following two conditions separately:
      1:bpf_jit_enable=1 and bpf_jit_harden=0
      2:bpf_jit_enable=1 and bpf_jit_harden=2
      
      Below are some numbers for this jit implementation:
      Note:
        I run test_progs in kselftest 100 times continuously for every condition,
        the numbers are in format: total/times=avg.
        The numbers that test_bpf reports show almost the same relation.
      
      a:jit_enable=0 and jit_harden=0            b:jit_enable=1 and jit_harden=0
        test_pkt_access:PASS:ipv4:15622/100=156    test_pkt_access:PASS:ipv4:10674/100=106
        test_pkt_access:PASS:ipv6:9130/100=91      test_pkt_access:PASS:ipv6:4855/100=48
        test_xdp:PASS:ipv4:240198/100=2401         test_xdp:PASS:ipv4:138912/100=1389
        test_xdp:PASS:ipv6:137326/100=1373         test_xdp:PASS:ipv6:68542/100=685
        test_l4lb:PASS:ipv4:61100/100=611          test_l4lb:PASS:ipv4:37302/100=373
        test_l4lb:PASS:ipv6:101000/100=1010        test_l4lb:PASS:ipv6:55030/100=550
      
      c:jit_enable=1 and jit_harden=2
        test_pkt_access:PASS:ipv4:10558/100=105
        test_pkt_access:PASS:ipv6:5092/100=50
        test_xdp:PASS:ipv4:131902/100=1319
        test_xdp:PASS:ipv6:77932/100=779
        test_l4lb:PASS:ipv4:38924/100=389
        test_l4lb:PASS:ipv6:57520/100=575
      
      The numbers show we get 30%~50% improvement.
      
      See Documentation/networking/filter.txt for more information.
      
      Changelog:
      
       Changes v5-v6:
       1:Add do {} while (0) to RETPOLINE_RAX_BPF_JIT for
         consistence reason.
       2:Clean up non-standard comments, reported by Daniel Borkmann.
       3:Fix a memory leak issue, repoted by Daniel Borkmann.
      
       Changes v4-v5:
       1:Delete is_on_stack, BPF_REG_AX is the only one
         on real hardware registers, so just check with
         it.
       2:Apply commit 1612a981 ("bpf, x64: fix JIT emission
         for dead code"), suggested by Daniel Borkmann.
      
       Changes v3-v4:
       1:Fix changelog in commit.
         I install llvm-6.0, then test_progs willn't report errors.
         I submit another patch:
         "bpf: fix misaligned access for BPF_PROG_TYPE_PERF_EVENT program type on x86_32 platform"
         to fix another problem, after that patch, test_verifier willn't report errors too.
       2:Fix clear r0[1] twice unnecessarily in *BPF_IND|BPF_ABS* simulation.
      
       Changes v2-v3:
       1:Move BPF_REG_AX to real hardware registers for performance reason.
       3:Using bpf_load_pointer instead of bpf_jit32.S, suggested by Daniel Borkmann.
       4:Delete partial codes in 1c2a088a, suggested by Daniel Borkmann.
       5:Some bug fixes and comments improvement.
      
       Changes v1-v2:
       1:Fix bug in emit_ia32_neg64.
       2:Fix bug in emit_ia32_arsh_r64.
       3:Delete filename in top level comment, suggested by Thomas Gleixner.
       4:Delete unnecessary boiler plate text, suggested by Thomas Gleixner.
       5:Rewrite some words in changelog.
       6:CodingSytle improvement and a little more comments.
      Signed-off-by: NWang YanQing <udknight@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      03f5781b
  12. 03 5月, 2018 7 次提交
  13. 14 3月, 2018 1 次提交
  14. 23 2月, 2018 1 次提交
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
  15. 21 2月, 2018 2 次提交
  16. 20 2月, 2018 2 次提交
  17. 15 2月, 2018 1 次提交
  18. 13 2月, 2018 1 次提交
    • D
      Revert "x86/speculation: Simplify indirect_branch_prediction_barrier()" · f208820a
      David Woodhouse 提交于
      This reverts commit 64e16720.
      
      We cannot call C functions like that, without marking all the
      call-clobbered registers as, well, clobbered. We might have got away
      with it for now because the __ibp_barrier() function was *fairly*
      unlikely to actually use any other registers. But no. Just no.
      Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arjan.van.de.ven@intel.com
      Cc: dave.hansen@intel.com
      Cc: jmattson@google.com
      Cc: karahmed@amazon.de
      Cc: kvm@vger.kernel.org
      Cc: pbonzini@redhat.com
      Cc: rkrcmar@redhat.com
      Cc: sironi@amazon.de
      Link: http://lkml.kernel.org/r/1518305967-31356-3-git-send-email-dwmw@amazon.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f208820a
  19. 03 2月, 2018 1 次提交
  20. 28 1月, 2018 3 次提交
  21. 26 1月, 2018 1 次提交