1. 20 12月, 2018 1 次提交
  2. 17 12月, 2018 4 次提交
  3. 08 12月, 2018 2 次提交
    • S
      tracing/fgraph: Fix set_graph_function from showing interrupts · 81f96623
      Steven Rostedt (VMware) 提交于
      commit 5cf99a0f3161bc3ae2391269d134d6bf7e26f00e upstream.
      
      The tracefs file set_graph_function is used to only function graph functions
      that are listed in that file (or all functions if the file is empty). The
      way this is implemented is that the function graph tracer looks at every
      function, and if the current depth is zero and the function matches
      something in the file then it will trace that function. When other functions
      are called, the depth will be greater than zero (because the original
      function will be at depth zero), and all functions will be traced where the
      depth is greater than zero.
      
      The issue is that when a function is first entered, and the handler that
      checks this logic is called, the depth is set to zero. If an interrupt comes
      in and a function in the interrupt handler is traced, its depth will be
      greater than zero and it will automatically be traced, even if the original
      function was not. But because the logic only looks at depth it may trace
      interrupts when it should not be.
      
      The recent design change of the function graph tracer to fix other bugs
      caused the depth to be zero while the function graph callback handler is
      being called for a longer time, widening the race of this happening. This
      bug was actually there for a longer time, but because the race window was so
      small it seldom happened. The Fixes tag below is for the commit that widen
      the race window, because that commit belongs to a series that will also help
      fix the original bug.
      
      Cc: stable@kernel.org
      Fixes: 39eb456dacb5 ("function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack")
      Reported-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Tested-by: NJoe Lawrence <joe.lawrence@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81f96623
    • A
      uprobes: Fix handle_swbp() vs. unregister() + register() race once more · ac8edc62
      Andrea Parri 提交于
      commit 09d3f015 upstream.
      
      Commit:
      
        142b18dd ("uprobes: Fix handle_swbp() vs unregister() + register() race")
      
      added the UPROBE_COPY_INSN flag, and corresponding smp_wmb() and smp_rmb()
      memory barriers, to ensure that handle_swbp() uses fully-initialized
      uprobes only.
      
      However, the smp_rmb() is mis-placed: this barrier should be placed
      after handle_swbp() has tested for the flag, thus guaranteeing that
      (program-order) subsequent loads from the uprobe can see the initial
      stores performed by prepare_uprobe().
      
      Move the smp_rmb() accordingly.  Also amend the comments associated
      to the two memory barriers to indicate their actual locations.
      Signed-off-by: NAndrea Parri <andrea.parri@amarulasolutions.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: stable@kernel.org
      Fixes: 142b18dd ("uprobes: Fix handle_swbp() vs unregister() + register() race")
      Link: http://lkml.kernel.org/r/20181122161031.15179-1-andrea.parri@amarulasolutions.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ac8edc62
  4. 06 12月, 2018 12 次提交
    • S
      function_graph: Reverse the order of pushing the ret_stack and the callback · a22ff9df
      Steven Rostedt (VMware) 提交于
      commit 7c6ea35e upstream.
      
      The function graph profiler uses the ret_stack to store the "subtime" and
      reuse it by nested functions and also on the return. But the current logic
      has the profiler callback called before the ret_stack is updated, and it is
      just modifying the ret_stack that will later be allocated (it's just lucky
      that the "subtime" is not touched when it is allocated).
      
      This could also cause a crash if we are at the end of the ret_stack when
      this happens.
      
      By reversing the order of the allocating the ret_stack and then calling the
      callbacks attached to a function being traced, the ret_stack entry is no
      longer used before it is allocated.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a22ff9df
    • S
      function_graph: Move return callback before update of curr_ret_stack · d2bcf809
      Steven Rostedt (VMware) 提交于
      commit 552701dd0fa7c3d448142e87210590ba424694a0 upstream.
      
      In the past, curr_ret_stack had two functions. One was to denote the depth
      of the call graph, the other is to keep track of where on the ret_stack the
      data is used. Although they may be slightly related, there are two cases
      where they need to be used differently.
      
      The one case is that it keeps the ret_stack data from being corrupted by an
      interrupt coming in and overwriting the data still in use. The other is just
      to know where the depth of the stack currently is.
      
      The function profiler uses the ret_stack to save a "subtime" variable that
      is part of the data on the ret_stack. If curr_ret_stack is modified too
      early, then this variable can be corrupted.
      
      The "max_depth" option, when set to 1, will record the first functions going
      into the kernel. To see all top functions (when dealing with timings), the
      depth variable needs to be lowered before calling the return hook. But by
      lowering the curr_ret_stack, it makes the data on the ret_stack still being
      used by the return hook susceptible to being overwritten.
      
      Now that there's two variables to handle both cases (curr_ret_depth), we can
      move them to the locations where they can handle both cases.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d2bcf809
    • S
      function_graph: Have profiler use curr_ret_stack and not depth · aec14c81
      Steven Rostedt (VMware) 提交于
      commit b1b35f2e upstream.
      
      The profiler uses trace->depth to find its entry on the ret_stack, but the
      depth may not match the actual location of where its entry is (if an
      interrupt were to preempt the processing of the profiler for another
      function, the depth and the curr_ret_stack will be different).
      
      Have it use the curr_ret_stack as the index to find its ret_stack entry
      instead of using the depth variable, as that is no longer guaranteed to be
      the same.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aec14c81
    • S
      function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack · 39237432
      Steven Rostedt (VMware) 提交于
      commit 39eb456dacb543de90d3bc6a8e0ac5cf51ac475e upstream.
      
      Currently, the depth of the ret_stack is determined by curr_ret_stack index.
      The issue is that there's a race between setting of the curr_ret_stack and
      calling of the callback attached to the return of the function.
      
      Commit 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling
      trace return callback") moved the calling of the callback to after the
      setting of the curr_ret_stack, even stating that it was safe to do so, when
      in fact, it was the reason there was a barrier() there (yes, I should have
      commented that barrier()).
      
      Not only does the curr_ret_stack keep track of the current call graph depth,
      it also keeps the ret_stack content from being overwritten by new data.
      
      The function profiler, uses the "subtime" variable of ret_stack structure
      and by moving the curr_ret_stack, it allows for interrupts to use the same
      structure it was using, corrupting the data, and breaking the profiler.
      
      To fix this, there needs to be two variables to handle the call stack depth
      and the pointer to where the ret_stack is being used, as they need to change
      at two different locations.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      39237432
    • S
      function_graph: Make ftrace_push_return_trace() static · 72c33b23
      Steven Rostedt (VMware) 提交于
      commit d125f3f8 upstream.
      
      As all architectures now call function_graph_enter() to do the entry work,
      no architecture should ever call ftrace_push_return_trace(). Make it static.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      72c33b23
    • S
      function_graph: Create function_graph_enter() to consolidate architecture code · 67d7bec3
      Steven Rostedt (VMware) 提交于
      commit 8114865f upstream.
      
      Currently all the architectures do basically the same thing in preparing the
      function graph tracer on entry to a function. This code can be pulled into a
      generic location and then this will allow the function graph tracer to be
      fixed, as well as extended.
      
      Create a new function graph helper function_graph_enter() that will call the
      hook function (ftrace_graph_entry) and the shadow stack operation
      (ftrace_push_return_trace), and remove the need of the architecture code to
      manage the shadow stack.
      
      This is needed to prepare for a fix of a design bug on how the curr_ret_stack
      is used.
      
      Cc: stable@kernel.org
      Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      67d7bec3
    • T
      ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · aecb9969
      Thomas Gleixner 提交于
      commit 46f7ecb1 upstream
      
      The IBPB control code in x86 removed the usage. Remove the functionality
      which was introduced for this.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aecb9969
    • T
      x86/speculation: Rework SMT state change · f55e301e
      Thomas Gleixner 提交于
      commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream
      
      arch_smt_update() is only called when the sysfs SMT control knob is
      changed. This means that when SMT is enabled in the sysfs control knob the
      system is considered to have SMT active even if all siblings are offline.
      
      To allow finegrained control of the speculation mitigations, the actual SMT
      state is more interesting than the fact that siblings could be enabled.
      
      Rework the code, so arch_smt_update() is invoked from each individual CPU
      hotplug function, and simplify the update function while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f55e301e
    • T
      sched/smt: Expose sched_smt_present static key · 340693ee
      Thomas Gleixner 提交于
      commit 321a874a upstream
      
      Make the scheduler's 'sched_smt_present' static key globaly available, so
      it can be used in the x86 speculation control code.
      
      Provide a query function and a stub for the CONFIG_SMP=n case.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      340693ee
    • P
      sched/smt: Make sched_smt_present track topology · a2c09481
      Peter Zijlstra (Intel) 提交于
      commit c5511d03ec090980732e929c318a7a6374b5550e upstream
      
      Currently the 'sched_smt_present' static key is enabled when at CPU bringup
      SMT topology is observed, but it is never disabled. However there is demand
      to also disable the key when the topology changes such that there is no SMT
      present anymore.
      
      Implement this by making the key count the number of cores that have SMT
      enabled.
      
      In particular, the SMT topology bits are set before interrrupts are enabled
      and similarly, are cleared after interrupts are disabled for the last time
      and the CPU dies.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: David Woodhouse <dwmw@amazon.co.uk>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Casey Schaufler <casey.schaufler@intel.com>
      Cc: Asit Mallick <asit.k.mallick@intel.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Waiman Long <longman9394@gmail.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Dave Stewart <david.c.stewart@intel.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a2c09481
    • J
      x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · cacd9385
      Jiri Kosina 提交于
      commit dbfe2953 upstream
      
      Currently, IBPB is only issued in cases when switching into a non-dumpable
      process, the rationale being to protect such 'important and security
      sensitive' processess (such as GPG) from data leaking into a different
      userspace process via spectre v2.
      
      This is however completely insufficient to provide proper userspace-to-userpace
      spectrev2 protection, as any process can poison branch buffers before being
      scheduled out, and the newly scheduled process immediately becomes spectrev2
      victim.
      
      In order to minimize the performance impact (for usecases that do require
      spectrev2 protection), issue the barrier only in cases when switching between
      processess where the victim can't be ptraced by the potential attacker (as in
      such cases, the attacker doesn't have to bother with branch buffers at all).
      
      [ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
        PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
        fine-grained ]
      
      Fixes: 18bf3c3e ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
      Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      cacd9385
    • J
      x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · b07fc04c
      Jiri Kosina 提交于
      commit 53c613fe upstream
      
      STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
      (once enabled) prevents cross-hyperthread control of decisions made by
      indirect branch predictors.
      
      Enable this feature if
      
      - the CPU is vulnerable to spectre v2
      - the CPU supports SMT and has SMT siblings online
      - spectre_v2 mitigation autoselection is enabled (default)
      
      After some previous discussion, this leaves STIBP on all the time, as wrmsr
      on crossing kernel boundary is a no-no. This could perhaps later be a bit
      more optimized (like disabling it in NOHZ, experiment with disabling it in
      idle, etc) if needed.
      
      Note that the synchronization of the mask manipulation via newly added
      spec_ctrl_mutex is currently not strictly needed, as the only updater is
      already being serialized by cpu_add_remove_lock, but let's make this a
      little bit more future-proof.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      b07fc04c
  5. 01 12月, 2018 3 次提交
    • P
      rcu: Make need_resched() respond to urgent RCU-QS needs · 016a8fc5
      Paul E. McKenney 提交于
      commit 92aa39e9 upstream.
      
      The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
      need for an RCU quiescent state from the force-quiescent-state processing
      within the grace-period kthread to context switches and to cond_resched().
      Unfortunately, such urgent needs are not communicated to need_resched(),
      which is sometimes used to decide when to invoke cond_resched(), for
      but one example, within the KVM vcpu_run() function.  As of v4.15, this
      can result in synchronize_sched() being delayed by up to ten seconds,
      which can be problematic, to say nothing of annoying.
      
      This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
      rcu_check_callbacks(), which is invoked from the scheduling-clock
      interrupt handler.  If the current task is not an idle task and is
      not executing in usermode, a context switch is forced, and either way,
      the rcu_dynticks.rcu_urgent_qs variable is set to false.  If the current
      task is an idle task, then RCU's dyntick-idle code will detect the
      quiescent state, so no further action is required.  Similarly, if the
      task is executing in usermode, other code in rcu_check_callbacks() and
      its called functions will report the corresponding quiescent state.
      Reported-by: NMarius Hillenbrand <mhillenb@amazon.de>
      Reported-by: NDavid Woodhouse <dwmw2@infradead.org>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Backported to make patch apply cleanly on older versions. ]
      Tested-by: NMarius Hillenbrand <mhillenb@amazon.de>
      Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      016a8fc5
    • P
      kdb: Use strscpy with destination buffer size · 2bc40f89
      Prarit Bhargava 提交于
      [ Upstream commit c2b94c72 ]
      
      gcc 8.1.0 warns with:
      
      kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’:
      kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
           strncpy(prefix_name, name, strlen(name)+1);
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      kernel/debug/kdb/kdb_support.c:239:31: note: length computed here
      
      Use strscpy() with the destination buffer size, and use ellipses when
      displaying truncated symbols.
      
      v2: Use strscpy()
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Toppins <jtoppins@redhat.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Daniel Thompson <daniel.thompson@linaro.org>
      Cc: kgdb-bugreport@lists.sourceforge.net
      Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      2bc40f89
    • P
      sched/fair: Fix cpu_util_wake() for 'execl' type workloads · 08fbd4e0
      Patrick Bellasi 提交于
      [ Upstream commit c469933e ]
      
      A ~10% regression has been reported for UnixBench's execl throughput
      test by Aaron Lu and Ye Xiaolong:
      
        https://lkml.org/lkml/2018/10/30/765
      
      That test is pretty simple, it does a "recursive" execve() syscall on the
      same binary. Starting from the syscall, this sequence is possible:
      
         do_execve()
           do_execveat_common()
             __do_execve_file()
               sched_exec()
                 select_task_rq_fair()          <==| Task already enqueued
                   find_idlest_cpu()
                     find_idlest_group()
                       capacity_spare_wake()    <==| Functions not called from
      		   cpu_util_wake()           | the wakeup path
      
      which means we can end up calling cpu_util_wake() not only from the
      "wakeup path", as its name would suggest. Indeed, the task doing an
      execve() syscall is already enqueued on the CPU we want to get the
      cpu_util_wake() for.
      
      The estimated utilization for a CPU computed in cpu_util_wake() was
      written under the assumption that function can be called only from the
      wakeup path. If instead the task is already enqueued, we end up with a
      utilization which does not remove the current task's contribution from
      the estimated utilization of the CPU.
      This will wrongly assume a reduced spare capacity on the current CPU and
      increase the chances to migrate the task on execve.
      
      The regression is tracked down to:
      
       commit d519329f ("sched/fair: Update util_est only on util_avg updates")
      
      because in that patch we turn on by default the UTIL_EST sched feature.
      However, the real issue is introduced by:
      
       commit f9be3e59 ("sched/fair: Use util_est in LB and WU paths")
      
      Let's fix this by ensuring to always discount the task estimated
      utilization from the CPU's estimated utilization when the task is also
      the current one. The same benchmark of the bug report, executed on a
      dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
      reports these "Execl Throughput" figures (higher the better):
      
         mainline     : 48136.5 lps
         mainline+fix : 55376.5 lps
      
      which correspond to a 15% speedup.
      
      Moreover, since {cpu_util,capacity_spare}_wake() are not really only
      used from the wakeup path, let's remove this ambiguity by using a better
      matching name: {cpu_util,capacity_spare}_without().
      
      Since we are at that, let's also improve the existing documentation.
      Reported-by: NAaron Lu <aaron.lu@intel.com>
      Reported-by: NYe Xiaolong <xiaolong.ye@intel.com>
      Tested-by: NAaron Lu <aaron.lu@intel.com>
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Perret <quentin.perret@arm.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@google.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Fixes: f9be3e59 (sched/fair: Use util_est in LB and WU paths)
      Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      08fbd4e0
  6. 27 11月, 2018 2 次提交
    • V
      sched/core: Take the hotplug lock in sched_init_smp() · b1e814e4
      Valentin Schneider 提交于
      [ Upstream commit 40fa3780 ]
      
      When running on linux-next (8c60c36d0b8c ("Add linux-next specific files
      for 20181019")) + CONFIG_PROVE_LOCKING=y on a big.LITTLE system (e.g.
      Juno or HiKey960), we get the following report:
      
       [    0.748225] Call trace:
       [    0.750685]  lockdep_assert_cpus_held+0x30/0x40
       [    0.755236]  static_key_enable_cpuslocked+0x20/0xc8
       [    0.760137]  build_sched_domains+0x1034/0x1108
       [    0.764601]  sched_init_domains+0x68/0x90
       [    0.768628]  sched_init_smp+0x30/0x80
       [    0.772309]  kernel_init_freeable+0x278/0x51c
       [    0.776685]  kernel_init+0x10/0x108
       [    0.780190]  ret_from_fork+0x10/0x18
      
      The static_key in question is 'sched_asym_cpucapacity' introduced by
      commit:
      
        df054e84 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")
      
      In this particular case, we enable it because smp_prepare_cpus() will
      end up fetching the capacity-dmips-mhz entry from the devicetree,
      so we already have some asymmetry detected when entering sched_init_smp().
      
      This didn't get detected in tip/sched/core because we were missing:
      
        commit cb538267 ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")
      
      Calls to build_sched_domains() post sched_init_smp() will hold the
      hotplug lock, it just so happens that this very first call is a
      special case. As stated by a comment in sched_init_smp(), "There's no
      userspace yet to cause hotplug operations" so this is a harmless
      warning.
      
      However, to both respect the semantics of underlying
      callees and make lockdep happy, take the hotplug lock in
      sched_init_smp(). This also satisfies the comment atop
      sched_init_domains() that says "Callers must hold the hotplug lock".
      Reported-by: NSudeep Holla <sudeep.holla@arm.com>
      Tested-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar.Eggemann@arm.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: morten.rasmussen@arm.com
      Cc: quentin.perret@arm.com
      Link: http://lkml.kernel.org/r/1540301851-3048-1-git-send-email-valentin.schneider@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      b1e814e4
    • D
      bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv · 1a7ccf42
      Daniel Borkmann 提交于
      [ Upstream commit 28c2fae7 ]
      
      While dbecd738 ("bpf: get kernel symbol addresses via syscall")
      zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
      from unprivileged users, commit 815581c1 ("bpf: get JITed image
      lengths of functions via syscall") forgot about doing so and therefore
      returns the #elems of the user set up buffer which is incorrect. It
      also needs to indicate a info.nr_jited_func_lens of zero.
      
      Fixes: 815581c1 ("bpf: get JITed image lengths of functions via syscall")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
      Cc: Song Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      1a7ccf42
  7. 23 11月, 2018 1 次提交
  8. 21 11月, 2018 3 次提交
  9. 14 11月, 2018 12 次提交