1. 06 12月, 2018 40 次提交
    • T
      x86/speculation: Add seccomp Spectre v2 user space protection mode · d1ec2354
      Thomas Gleixner 提交于
      commit 6b3e64c237c072797a9ec918654a60e3a46488e2 upstream
      
      If 'prctl' mode of user space protection from spectre v2 is selected
      on the kernel command-line, STIBP and IBPB are applied on tasks which
      restrict their indirect branch speculation via prctl.
      
      SECCOMP enables the SSBD mitigation for sandboxed tasks already, so it
      makes sense to prevent spectre v2 user space to user space attacks as
      well.
      
      The Intel mitigation guide documents how STIPB works:
          
         Setting bit 1 (STIBP) of the IA32_SPEC_CTRL MSR on a logical processor
         prevents the predicted targets of indirect branches on any logical
         processor of that core from being controlled by software that executes
         (or executed previously) on another logical processor of the same core.
      
      Ergo setting STIBP protects the task itself from being attacked from a task
      running on a different hyper-thread and protects the tasks running on
      different hyper-threads from being attacked.
      
      While the document suggests that the branch predictors are shielded between
      the logical processors, the observed performance regressions suggest that
      STIBP simply disables the branch predictor more or less completely. Of
      course the document wording is vague, but the fact that there is also no
      requirement for issuing IBPB when STIBP is used points clearly in that
      direction. The kernel still issues IBPB even when STIBP is used until Intel
      clarifies the whole mechanism.
      
      IBPB is issued when the task switches out, so malicious sandbox code cannot
      mistrain the branch predictor for the next user space task on the same
      logical processor.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185006.051663132@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d1ec2354
    • T
      x86/speculation: Enable prctl mode for spectre_v2_user · 7b62ef14
      Thomas Gleixner 提交于
      commit 7cc765a67d8e04ef7d772425ca5a2a1e2b894c15 upstream
      
      Now that all prerequisites are in place:
      
       - Add the prctl command line option
      
       - Default the 'auto' mode to 'prctl'
      
       - When SMT state changes, update the static key which controls the
         conditional STIBP evaluation on context switch.
      
       - At init update the static key which controls the conditional IBPB
         evaluation on context switch.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.958421388@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7b62ef14
    • T
      x86/speculation: Add prctl() control for indirect branch speculation · 238ba6e7
      Thomas Gleixner 提交于
      commit 9137bb27 upstream
      
      Add the PR_SPEC_INDIRECT_BRANCH option for the PR_GET_SPECULATION_CTRL and
      PR_SET_SPECULATION_CTRL prctls to allow fine grained per task control of
      indirect branch speculation via STIBP and IBPB.
      
      Invocations:
       Check indirect branch speculation status with
       - prctl(PR_GET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, 0, 0, 0);
      
       Enable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_ENABLE, 0, 0);
      
       Disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_DISABLE, 0, 0);
      
       Force disable indirect branch speculation with
       - prctl(PR_SET_SPECULATION_CTRL, PR_SPEC_INDIRECT_BRANCH, PR_SPEC_FORCE_DISABLE, 0, 0);
      
      See Documentation/userspace-api/spec_ctrl.rst.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.866780996@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      238ba6e7
    • T
      x86/speculation: Prepare arch_smt_update() for PRCTL mode · f67fafb8
      Thomas Gleixner 提交于
      commit 6893a959d7fdebbab5f5aa112c277d5a44435ba1 upstream
      
      The upcoming fine grained per task STIBP control needs to be updated on CPU
      hotplug as well.
      
      Split out the code which controls the strict mode so the prctl control code
      can be added later. Mark the SMP function call argument __unused while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.759457117@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f67fafb8
    • T
      x86/speculation: Prevent stale SPEC_CTRL msr content · e8412401
      Thomas Gleixner 提交于
      commit 6d991ba509ebcfcc908e009d1db51972a4f7a064 upstream
      
      The seccomp speculation control operates on all tasks of a process, but
      only the current task of a process can update the MSR immediately. For the
      other threads the update is deferred to the next context switch.
      
      This creates the following situation with Process A and B:
      
      Process A task 2 and Process B task 1 are pinned on CPU1. Process A task 2
      does not have the speculation control TIF bit set. Process B task 1 has the
      speculation control TIF bit set.
      
      CPU0					CPU1
      					MSR bit is set
      					ProcB.T1 schedules out
      					ProcA.T2 schedules in
      					MSR bit is cleared
      ProcA.T1
        seccomp_update()
        set TIF bit on ProcA.T2
      					ProcB.T1 schedules in
      					MSR is not updated  <-- FAIL
      
      This happens because the context switch code tries to avoid the MSR update
      if the speculation control TIF bits of the incoming and the outgoing task
      are the same. In the worst case ProcB.T1 and ProcA.T2 are the only tasks
      scheduling back and forth on CPU1, which keeps the MSR stale forever.
      
      In theory this could be remedied by IPIs, but chasing the remote task which
      could be migrated is complex and full of races.
      
      The straight forward solution is to avoid the asychronous update of the TIF
      bit and defer it to the next context switch. The speculation control state
      is stored in task_struct::atomic_flags by the prctl and seccomp updates
      already.
      
      Add a new TIF_SPEC_FORCE_UPDATE bit and set this after updating the
      atomic_flags. Check the bit on context switch and force a synchronous
      update of the speculation control if set. Use the same mechanism for
      updating the current task.
      Reported-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811272247140.1875@nanos.tec.linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8412401
    • T
      x86/speculation: Split out TIF update · 59028be1
      Thomas Gleixner 提交于
      commit e6da8bb6f9abb2628381904b24163c770e630bac upstream
      
      The update of the TIF_SSBD flag and the conditional speculation control MSR
      update is done in the ssb_prctl_set() function directly. The upcoming prctl
      support for controlling indirect branch speculation via STIBP needs the
      same mechanism.
      
      Split the code out and make it reusable. Reword the comment about updates
      for other tasks.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.652305076@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59028be1
    • T
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · aecb9969
      Thomas Gleixner 提交于
      commit 46f7ecb1e7359f183f5bbd1e08b90e10e52164f9 upstream
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aecb9969
    • T
      x86/speculation: Prepare for conditional IBPB in switch_mm() · a1788815
      Thomas Gleixner 提交于
      commit 4c71a2b6fd7e42814aa68a6dec88abf3b42ea573 upstream
      
      The IBPB speculation barrier is issued from switch_mm() when the kernel
      switches to a user space task with a different mm than the user space task
      which ran last on the same CPU.
      
      An additional optimization is to avoid IBPB when the incoming task can be
      ptraced by the outgoing task. This optimization only works when switching
      directly between two user space tasks. When switching from a kernel task to
      a user space task the optimization fails because the previous task cannot
      be accessed anymore. So for quite some scenarios the optimization is just
      adding overhead.
      
      The upcoming conditional IBPB support will issue IBPB only for user space
      tasks which have the TIF_SPEC_IB bit set. This requires to handle the
      following cases:
      
        1) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB set to a user space task (potential victim) which has
           TIF_SPEC_IB not set.
      
        2) Switch from a user space task (potential attacker) which has
           TIF_SPEC_IB not set to a user space task (potential victim) which has
           TIF_SPEC_IB set.
      
      This needs to be optimized for the case where the IBPB can be avoided when
      only kernel threads ran in between user space tasks which belong to the
      same process.
      
      The current check whether two tasks belong to the same context is using the
      tasks context id. While correct, it's simpler to use the mm pointer because
      it allows to mangle the TIF_SPEC_IB bit into it. The context id based
      mechanism requires extra storage, which creates worse code.
      
      When a task is scheduled out its TIF_SPEC_IB bit is mangled as bit 0 into
      the per CPU storage which is used to track the last user space mm which was
      running on a CPU. This bit can be used together with the TIF_SPEC_IB bit of
      the incoming task to make the decision whether IBPB needs to be issued or
      not to cover the two cases above.
      
      As conditional IBPB is going to be the default, remove the dubious ptrace
      check for the IBPB always case and simply issue IBPB always when the
      process changes.
      
      Move the storage to a different place in the struct as the original one
      created a hole.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.466447057@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a1788815
    • T
      x86/speculation: Avoid __switch_to_xtra() calls · dd73e15e
      Thomas Gleixner 提交于
      commit 5635d999 upstream
      
      The TIF_SPEC_IB bit does not need to be evaluated in the decision to invoke
      __switch_to_xtra() when:
      
       - CONFIG_SMP is disabled
      
       - The conditional STIPB mode is disabled
      
      The TIF_SPEC_IB bit still controls IBPB in both cases so the TIF work mask
      checks might invoke __switch_to_xtra() for nothing if TIF_SPEC_IB is the
      only set bit in the work masks.
      
      Optimize it out by masking the bit at compile time for CONFIG_SMP=n and at
      run time when the static key controlling the conditional STIBP mode is
      disabled.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.374062201@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dd73e15e
    • T
      x86/process: Consolidate and simplify switch_to_xtra() code · a87c81f0
      Thomas Gleixner 提交于
      commit ff16701a upstream
      
      Move the conditional invocation of __switch_to_xtra() into an inline
      function so the logic can be shared between 32 and 64 bit.
      
      Remove the handthrough of the TSS pointer and retrieve the pointer directly
      in the bitmap handling function. Use this_cpu_ptr() instead of the
      per_cpu() indirection.
      
      This is a preparatory change so integration of conditional indirect branch
      speculation optimization happens only in one place.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.280855518@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a87c81f0
    • T
      x86/speculation: Prepare for per task indirect branch speculation control · 69985a2c
      Tim Chen 提交于
      commit 5bfbe3ad upstream
      
      To avoid the overhead of STIBP always on, it's necessary to allow per task
      control of STIBP.
      
      Add a new task flag TIF_SPEC_IB and evaluate it during context switch if
      SMT is active and flag evaluation is enabled by the speculation control
      code. Add the conditional evaluation to x86_virt_spec_ctrl() as well so the
      guest/host switch works properly.
      
      This has no effect because TIF_SPEC_IB cannot be set yet and the static key
      which controls evaluation is off. Preparatory patch for adding the control
      code.
      
      [ tglx: Simplify the context switch logic and make the TIF evaluation
        	depend on SMP=y and on the static key controlling the conditional
        	update. Rename it to TIF_SPEC_IB because it controls both STIBP and
        	IBPB ]
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.176917199@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      69985a2c
    • T
      x86/speculation: Add command line control for indirect branch speculation · 71187543
      Thomas Gleixner 提交于
      commit fa1202ef224391b6f5b26cdd44cc50495e8fab54 upstream
      
      Add command line control for user space indirect branch speculation
      mitigations. The new option is: spectre_v2_user=
      
      The initial options are:
      
          -  on:   Unconditionally enabled
          - off:   Unconditionally disabled
          -auto:   Kernel selects mitigation (default off for now)
      
      When the spectre_v2= command line argument is either 'on' or 'off' this
      implies that the application to application control follows that state even
      if a contradicting spectre_v2_user= argument is supplied.
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.082720373@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      71187543
    • T
      x86/speculation: Unify conditional spectre v2 print functions · 8a34c706
      Thomas Gleixner 提交于
      commit 495d470e9828500e0155027f230449ac5e29c025 upstream
      
      There is no point in having two functions and a conditional at the call
      site.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.986890749@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a34c706
    • T
      x86/speculataion: Mark command line parser data __initdata · 507ac84e
      Thomas Gleixner 提交于
      commit 30ba72a9 upstream
      
      No point to keep that around.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.893886356@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      507ac84e
    • T
      x86/speculation: Mark string arrays const correctly · 275d90f0
      Thomas Gleixner 提交于
      commit 8770709f upstream
      
      checkpatch.pl muttered when reshuffling the code:
       WARNING: static const char * array should probably be static const char * const
      
      Fix up all the string arrays.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.800018931@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      275d90f0
    • T
      x86/speculation: Reorder the spec_v2 code · 10c24dc6
      Thomas Gleixner 提交于
      commit 15d6b7aab0793b2de8a05d8a828777dd24db424e upstream
      
      Reorder the code so it is better grouped. No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.707122879@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      10c24dc6
    • T
      x86/l1tf: Show actual SMT state · 1572793b
      Thomas Gleixner 提交于
      commit 130d6f94 upstream
      
      Use the now exposed real SMT state, not the SMT sysfs control knob
      state. This reflects the state of the system when the mitigation status is
      queried.
      
      This does not change the warning in the VMX launch code. There the
      dependency on the control knob makes sense because siblings could be
      brought online anytime after launching the VM.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.613357354@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1572793b
    • T
      x86/speculation: Rework SMT state change · f55e301e
      Thomas Gleixner 提交于
      commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream
      
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f55e301e
    • T
      sched/smt: Expose sched_smt_present static key · 340693ee
      Thomas Gleixner 提交于
      commit 321a874a7ef85655e93b3206d0f36b4a6097f948 upstream
      
      Make the scheduler's 'sched_smt_present' static key globaly available, so
      it can be used in the x86 speculation control code.
      
      Provide a query function and a stub for the CONFIG_SMP=n case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      340693ee
    • T
      x86/Kconfig: Select SCHED_SMT if SMP enabled · f98bf164
      Thomas Gleixner 提交于
      commit dbe73364 upstream
      
      CONFIG_SCHED_SMT is enabled by all distros, so there is not a real point to
      have it configurable. The runtime overhead in the core scheduler code is
      minimal because the actual SMT scheduling parts are conditional on a static
      key.
      
      This allows to expose the scheduler's SMT state static key to the
      speculation control code. Alternatively the scheduler's static key could be
      made always available when CONFIG_SMP is enabled, but that's just adding an
      unused static key to every other architecture for nothing.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.337452245@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f98bf164
    • P
      sched/smt: Make sched_smt_present track topology · a2c09481
      Peter Zijlstra (Intel) 提交于
      commit c5511d03ec090980732e929c318a7a6374b5550e upstream
      
      Currently the 'sched_smt_present' static key is enabled when at CPU bringup
      SMT topology is observed, but it is never disabled. However there is demand
      to also disable the key when the topology changes such that there is no SMT
      present anymore.
      
      Implement this by making the key count the number of cores that have SMT
      enabled.
      
      In particular, the SMT topology bits are set before interrrupts are enabled
      and similarly, are cleared after interrupts are disabled for the last time
      and the CPU dies.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2c09481
    • T
      x86/speculation: Reorganize speculation control MSRs update · 6308dde5
      Tim Chen 提交于
      commit 01daf56875ee0cd50ed496a09b20eb369b45dfa5 upstream
      
      The logic to detect whether there's a change in the previous and next
      task's flag relevant to update speculation control MSRs is spread out
      across multiple functions.
      
      Consolidate all checks needed for updating speculation control MSRs into
      the new __speculation_ctrl_update() helper function.
      
      This makes it easy to pick the right speculation control MSR and the bits
      in MSR_IA32_SPEC_CTRL that need updating based on TIF flags changes.
      Originally-by: NThomas Lendacky <Thomas.Lendacky@amd.com>
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.151077005@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6308dde5
    • T
      x86/speculation: Rename SSBD update functions · 39402a5e
      Thomas Gleixner 提交于
      commit 26c4d75b234040c11728a8acb796b3a85ba7507c upstream
      
      During context switch, the SSBD bit in SPEC_CTRL MSR is updated according
      to changes of the TIF_SSBD flag in the current and next running task.
      
      Currently, only the bit controlling speculative store bypass disable in
      SPEC_CTRL MSR is updated and the related update functions all have
      "speculative_store" or "ssb" in their names.
      
      For enhanced mitigation control other bits in SPEC_CTRL MSR need to be
      updated as well, which makes the SSB names inadequate.
      
      Rename the "speculative_store*" functions to a more generic name. No
      functional change.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.058866968@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39402a5e
    • T
      x86/speculation: Disable STIBP when enhanced IBRS is in use · ca41d792
      Tim Chen 提交于
      commit 34bce7c9 upstream
      
      If enhanced IBRS is active, STIBP is redundant for mitigating Spectre v2
      user space exploits from hyperthread sibling.
      
      Disable STIBP when enhanced IBRS is used.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185003.966801480@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ca41d792
    • T
      x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common() · f37b55ab
      Tim Chen 提交于
      commit a8f76ae4 upstream
      
      The Spectre V2 printout in cpu_show_common() handles conditionals for the
      various mitigation methods directly in the sprintf() argument list. That's
      hard to read and will become unreadable if more complex decisions need to
      be made for a particular method.
      
      Move the conditionals for STIBP and IBPB string selection into helper
      functions, so they can be extended later on.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185003.874479208@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f37b55ab
    • T
      x86/speculation: Remove unnecessary ret variable in cpu_show_common() · 30a8e214
      Tim Chen 提交于
      commit b86bda04 upstream
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185003.783903657@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30a8e214
    • T
      x86/speculation: Clean up spectre_v2_parse_cmdline() · 2ee44e2d
      Tim Chen 提交于
      commit 24848509 upstream
      
      Remove the unnecessary 'else' statement in spectre_v2_parse_cmdline()
      to save an indentation level.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185003.688010903@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2ee44e2d
    • T
      x86/speculation: Update the TIF_SSBD comment · e8494e5d
      Tim Chen 提交于
      commit 8eb729b77faf83ac4c1f363a9ad68d042415f24c upstream
      
      "Reduced Data Speculation" is an obsolete term. The correct new name is
      "Speculative store bypass disable" - which is abbreviated into SSBD.
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185003.593893901@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e8494e5d
    • Z
      x86/retpoline: Remove minimal retpoline support · 90d2c53f
      Zhenzhong Duan 提交于
      commit ef014aae upstream
      
      Now that CONFIG_RETPOLINE hard depends on compiler support, there is no
      reason to keep the minimal retpoline support around which only provided
      basic protection in the assembly files.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <srinivas.eeda@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/f06f0a89-5587-45db-8ed2-0a9d6638d5c0@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d2c53f
    • Z
      x86/retpoline: Make CONFIG_RETPOLINE depend on compiler support · 8c4ad5d3
      Zhenzhong Duan 提交于
      commit 4cd24de3 upstream
      
      Since retpoline capable compilers are widely available, make
      CONFIG_RETPOLINE hard depend on the compiler capability.
      
      Break the build when CONFIG_RETPOLINE is enabled and the compiler does not
      support it. Emit an error message in that case:
      
       "arch/x86/Makefile:226: *** You are building kernel with non-retpoline
        compiler, please update your compiler..  Stop."
      
      [dwmw: Fail the build with non-retpoline compiler]
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Michal Marek <michal.lkml@markovi.net>
      Cc: <srinivas.eeda@oracle.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/cca0cb20-f9e2-4094-840b-fb0f8810cd34@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c4ad5d3
    • Z
      x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant · cbc93677
      Zhenzhong Duan 提交于
      commit 0cbb76d6285794f30953bfa3ab831714b59dd700 upstream
      
      ..so that they match their asm counterpart.
      
      Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wang YanQing <udknight@gmail.com>
      Cc: dhaval.giani@oracle.com
      Cc: srinivas.eeda@oracle.com
      Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@defaultSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cbc93677
    • J
      x86/speculation: Propagate information about RSB filling mitigation to sysfs · 44a07610
      Jiri Kosina 提交于
      commit bb4b3b77 upstream
      
      If spectrev2 mitigation has been enabled, RSB is filled on context switch
      in order to protect from various classes of spectrev2 attacks.
      
      If this mitigation is enabled, say so in sysfs for spectrev2.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438580.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      44a07610
    • J
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · cacd9385
      Jiri Kosina 提交于
      commit dbfe2953f63c640463c630746cd5d9de8b2f63ae upstream
      
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      cacd9385
    • J
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · b07fc04c
      Jiri Kosina 提交于
      commit 53c613fe6349994f023245519265999eed75957f upstream
      
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b07fc04c
    • J
      tipc: fix lockdep warning during node delete · 4e3fbd74
      Jon Maloy 提交于
      [ Upstream commit ec835f891232d7763dea9da0358f31e24ca6dfb7 ]
      
      We see the following lockdep warning:
      
      [ 2284.078521] ======================================================
      [ 2284.078604] WARNING: possible circular locking dependency detected
      [ 2284.078604] 4.19.0+ #42 Tainted: G            E
      [ 2284.078604] ------------------------------------------------------
      [ 2284.078604] rmmod/254 is trying to acquire lock:
      [ 2284.078604] 00000000acd94e28 ((&n->timer)#2){+.-.}, at: del_timer_sync+0x5/0xa0
      [ 2284.078604]
      [ 2284.078604] but task is already holding lock:
      [ 2284.078604] 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x190 [tipc]
      [ 2284.078604]
      [ 2284.078604] which lock already depends on the new lock.
      [ 2284.078604]
      [ 2284.078604]
      [ 2284.078604] the existing dependency chain (in reverse order) is:
      [ 2284.078604]
      [ 2284.078604] -> #1 (&(&tn->node_list_lock)->rlock){+.-.}:
      [ 2284.078604]        tipc_node_timeout+0x20a/0x330 [tipc]
      [ 2284.078604]        call_timer_fn+0xa1/0x280
      [ 2284.078604]        run_timer_softirq+0x1f2/0x4d0
      [ 2284.078604]        __do_softirq+0xfc/0x413
      [ 2284.078604]        irq_exit+0xb5/0xc0
      [ 2284.078604]        smp_apic_timer_interrupt+0xac/0x210
      [ 2284.078604]        apic_timer_interrupt+0xf/0x20
      [ 2284.078604]        default_idle+0x1c/0x140
      [ 2284.078604]        do_idle+0x1bc/0x280
      [ 2284.078604]        cpu_startup_entry+0x19/0x20
      [ 2284.078604]        start_secondary+0x187/0x1c0
      [ 2284.078604]        secondary_startup_64+0xa4/0xb0
      [ 2284.078604]
      [ 2284.078604] -> #0 ((&n->timer)#2){+.-.}:
      [ 2284.078604]        del_timer_sync+0x34/0xa0
      [ 2284.078604]        tipc_node_delete+0x1a/0x40 [tipc]
      [ 2284.078604]        tipc_node_stop+0xcb/0x190 [tipc]
      [ 2284.078604]        tipc_net_stop+0x154/0x170 [tipc]
      [ 2284.078604]        tipc_exit_net+0x16/0x30 [tipc]
      [ 2284.078604]        ops_exit_list.isra.8+0x36/0x70
      [ 2284.078604]        unregister_pernet_operations+0x87/0xd0
      [ 2284.078604]        unregister_pernet_subsys+0x1d/0x30
      [ 2284.078604]        tipc_exit+0x11/0x6f2 [tipc]
      [ 2284.078604]        __x64_sys_delete_module+0x1df/0x240
      [ 2284.078604]        do_syscall_64+0x66/0x460
      [ 2284.078604]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 2284.078604]
      [ 2284.078604] other info that might help us debug this:
      [ 2284.078604]
      [ 2284.078604]  Possible unsafe locking scenario:
      [ 2284.078604]
      [ 2284.078604]        CPU0                    CPU1
      [ 2284.078604]        ----                    ----
      [ 2284.078604]   lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]                                lock((&n->timer)#2);
      [ 2284.078604]                                lock(&(&tn->node_list_lock)->rlock);
      [ 2284.078604]   lock((&n->timer)#2);
      [ 2284.078604]
      [ 2284.078604]  *** DEADLOCK ***
      [ 2284.078604]
      [ 2284.078604] 3 locks held by rmmod/254:
      [ 2284.078604]  #0: 000000003368be9b (pernet_ops_rwsem){+.+.}, at: unregister_pernet_subsys+0x15/0x30
      [ 2284.078604]  #1: 0000000046ed9c86 (rtnl_mutex){+.+.}, at: tipc_net_stop+0x144/0x170 [tipc]
      [ 2284.078604]  #2: 00000000f997afc0 (&(&tn->node_list_lock)->rlock){+.-.}, at: tipc_node_stop+0xac/0x19
      [...}
      
      The reason is that the node timer handler sometimes needs to delete a
      node which has been disconnected for too long. To do this, it grabs
      the lock 'node_list_lock', which may at the same time be held by the
      generic node cleanup function, tipc_node_stop(), during module removal.
      Since the latter is calling del_timer_sync() inside the same lock, we
      have a potential deadlock.
      
      We fix this letting the timer cleanup function use spin_trylock()
      instead of just spin_lock(), and when it fails to grab the lock it
      just returns so that the timer handler can terminate its execution.
      This is safe to do, since tipc_node_stop() anyway is about to
      delete both the timer and the node instance.
      
      Fixes: 6a939f36 ("tipc: Auto removal of peer down node instance")
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4e3fbd74
    • H
      net: phy: add workaround for issue where PHY driver doesn't bind to the device · 38af4b90
      Heiner Kallweit 提交于
      [ Upstream commit c85ddecae6e5e82ca3ae6f20c63f1d865e2ff5ea ]
      
      After switching the r8169 driver to use phylib some user reported that
      their network is broken. This was caused by the genphy PHY driver being
      used instead of the dedicated PHY driver for the RTL8211B. Users
      reported that loading the Realtek PHY driver module upfront fixes the
      issue. See also this mail thread:
      https://marc.info/?t=154279781800003&r=1&w=2
      The issue is quite weird and the root cause seems to be somewhere in
      the base driver core. The patch works around the issue and may be
      removed once the actual issue is fixed.
      
      The Fixes tag refers to the first reported occurrence of the issue.
      The issue itself may have been existing much longer and it may affect
      users of other network chips as well. Users typically will recognize
      this issue only if their PHY stops working when being used with the
      genphy driver.
      
      Fixes: f1e911d5 ("r8169: add basic phylib support")
      Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      38af4b90
    • E
      tcp: defer SACK compression after DupThresh · aaa7e45c
      Eric Dumazet 提交于
      [ Upstream commit 86de5921 ]
      
      Jean-Louis reported a TCP regression and bisected to recent SACK
      compression.
      
      After a loss episode (receiver not able to keep up and dropping
      packets because its backlog is full), linux TCP stack is sending
      a single SACK (DUPACK).
      
      Sender waits a full RTO timer before recovering losses.
      
      While RFC 6675 says in section 5, "Algorithm Details",
      
         (2) If DupAcks < DupThresh but IsLost (HighACK + 1) returns true --
             indicating at least three segments have arrived above the current
             cumulative acknowledgment point, which is taken to indicate loss
             -- go to step (4).
      ...
         (4) Invoke fast retransmit and enter loss recovery as follows:
      
      there are old TCP stacks not implementing this strategy, and
      still counting the dupacks before starting fast retransmit.
      
      While these stacks probably perform poorly when receivers implement
      LRO/GRO, we should be a little more gentle to them.
      
      This patch makes sure we do not enable SACK compression unless
      3 dupacks have been sent since last rcv_nxt update.
      
      Ideally we should even rearm the timer to send one or two
      more DUPACK if no more packets are coming, but that will
      be work aiming for linux-4.21.
      
      Many thanks to Jean-Louis for bisecting the issue, providing
      packet captures and testing this patch.
      
      Fixes: 5d9f4262 ("tcp: add SACK compression")
      Reported-by: NJean-Louis Dupond <jean-louis@dupond.be>
      Tested-by: NJean-Louis Dupond <jean-louis@dupond.be>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aaa7e45c
    • T
      net/dim: Update DIM start sample after each DIM iteration · b8e07695
      Tal Gilboa 提交于
      [ Upstream commit 0211dda68a4f6531923a2f72d8e8959207f59fba ]
      
      On every iteration of net_dim, the algorithm may choose to
      check for the system state by comparing current data sample
      with previous data sample. After each of these comparison,
      regardless of the action taken, the sample used as baseline
      is needed to be updated.
      
      This patch fixes a bug that causes DIM to take wrong decisions,
      due to never updating the baseline sample for comparison between
      iterations. This way, DIM always compares current sample with
      zeros.
      
      Although this is a functional fix, it also improves and stabilizes
      performance as the algorithm works properly now.
      
      Performance:
      Tested single UDP TX stream with pktgen:
      samples/pktgen/pktgen_sample03_burst_single_flow.sh -i p4p2 -d 1.1.1.1
      -m 24:8a:07:88:26:8b -f 3 -b 128
      
      ConnectX-5 100GbE packet rate improved from 15-19Mpps to 19-20Mpps.
      Also, toggling between profiles is less frequent with the fix.
      
      Fixes: 8115b750 ("net/dim: use struct net_dim_sample as arg to net_dim")
      Signed-off-by: NTal Gilboa <talgi@mellanox.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8e07695
    • J
      virtio-net: fail XDP set if guest csum is negotiated · b06510bf
      Jason Wang 提交于
      [ Upstream commit 18ba58e1 ]
      
      We don't support partial csumed packet since its metadata will be lost
      or incorrect during XDP processing. So fail the XDP set if guest_csum
      feature is negotiated.
      
      Fixes: f600b690 ("virtio_net: Add XDP support")
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b06510bf
    • J
      virtio-net: disable guest csum during XDP set · 1af400be
      Jason Wang 提交于
      [ Upstream commit e59ff2c4 ]
      
      We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
      can receive partial csumed packets with metadata kept in the
      vnet_hdr. This may have several side effects:
      
      - It could be overridden by header adjustment, thus is might be not
        correct after XDP processing.
      - There's no way to pass such metadata information through
        XDP_REDIRECT to another driver.
      - XDP does not support checksum offload right now.
      
      So simply disable guest csum if possible in this the case of XDP.
      
      Fixes: 3f93522f ("virtio-net: switch off offloads on demand if possible on XDP set")
      Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Pavel Popa <pashinho1990@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1af400be