提交 · 392374326d2905e1c617fae703237be84ba09e4f · openanolis / cloud-kernel

06 12月, 2018 9 次提交

function_graph: Use new curr_ret_depth to manage depth instead of curr_ret_stack · 39237432

由 Steven Rostedt (VMware) 提交于 11月 19, 2018

commit 39eb456dacb543de90d3bc6a8e0ac5cf51ac475e upstream.

Currently, the depth of the ret_stack is determined by curr_ret_stack index.
The issue is that there's a race between setting of the curr_ret_stack and
calling of the callback attached to the return of the function.

Commit 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling
trace return callback") moved the calling of the callback to after the
setting of the curr_ret_stack, even stating that it was safe to do so, when
in fact, it was the reason there was a barrier() there (yes, I should have
commented that barrier()).

Not only does the curr_ret_stack keep track of the current call graph depth,
it also keeps the ret_stack content from being overwritten by new data.

The function profiler, uses the "subtime" variable of ret_stack structure
and by moving the curr_ret_stack, it allows for interrupts to use the same
structure it was using, corrupting the data, and breaking the profiler.

To fix this, there needs to be two variables to handle the call stack depth
and the pointer to where the ret_stack is being used, as they need to change
at two different locations.

Cc: stable@kernel.org
Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

39237432

function_graph: Make ftrace_push_return_trace() static · 72c33b23

由 Steven Rostedt (VMware) 提交于 11月 19, 2018

commit d125f3f8 upstream.

As all architectures now call function_graph_enter() to do the entry work,
no architecture should ever call ftrace_push_return_trace(). Make it static.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

72c33b23

function_graph: Create function_graph_enter() to consolidate architecture code · 67d7bec3

由 Steven Rostedt (VMware) 提交于 11月 18, 2018

commit 8114865f upstream.

Currently all the architectures do basically the same thing in preparing the
function graph tracer on entry to a function. This code can be pulled into a
generic location and then this will allow the function graph tracer to be
fixed, as well as extended.

Create a new function graph helper function_graph_enter() that will call the
hook function (ftrace_graph_entry) and the shadow stack operation
(ftrace_push_return_trace), and remove the need of the architecture code to
manage the shadow stack.

This is needed to prepare for a fix of a design bug on how the curr_ret_stack
is used.

Cc: stable@kernel.org
Fixes: 03274a3f ("tracing/fgraph: Adjust fgraph depth before calling trace return callback")
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

67d7bec3

ptrace: Remove unused ptrace_may_access_sched() and MODE_IBRS · aecb9969

由 Thomas Gleixner 提交于 11月 25, 2018

commit 46f7ecb1 upstream

The IBPB control code in x86 removed the usage. Remove the functionality
which was introduced for this.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185005.559149393@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

aecb9969

x86/speculation: Rework SMT state change · f55e301e

由 Thomas Gleixner 提交于 11月 25, 2018

commit a74cfffb03b73d41e08f84c2e5c87dec0ce3db9f upstream

arch_smt_update() is only called when the sysfs SMT control knob is
changed. This means that when SMT is enabled in the sysfs control knob the
system is considered to have SMT active even if all siblings are offline.

To allow finegrained control of the speculation mitigations, the actual SMT
state is more interesting than the fact that siblings could be enabled.

Rework the code, so arch_smt_update() is invoked from each individual CPU
hotplug function, and simplify the update function while at it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.521974984@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

f55e301e

sched/smt: Expose sched_smt_present static key · 340693ee

由 Thomas Gleixner 提交于 11月 25, 2018

commit 321a874a upstream

Make the scheduler's 'sched_smt_present' static key globaly available, so
it can be used in the x86 speculation control code.

Provide a query function and a stub for the CONFIG_SMP=n case.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.430168326@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

340693ee

sched/smt: Make sched_smt_present track topology · a2c09481

由 Peter Zijlstra (Intel) 提交于 11月 25, 2018

commit c5511d03ec090980732e929c318a7a6374b5550e upstream

Currently the 'sched_smt_present' static key is enabled when at CPU bringup
SMT topology is observed, but it is never disabled. However there is demand
to also disable the key when the topology changes such that there is no SMT
present anymore.

Implement this by making the key count the number of cores that have SMT
enabled.

In particular, the SMT topology bits are set before interrrupts are enabled
and similarly, are cleared after interrupts are disabled for the last time
and the CPU dies.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NIngo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Casey Schaufler <casey.schaufler@intel.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jon Masters <jcm@redhat.com>
Cc: Waiman Long <longman9394@gmail.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Dave Stewart <david.c.stewart@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20181125185004.246110444@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

a2c09481

x86/speculation: Apply IBPB more strictly to avoid cross-process data leak · cacd9385

由 Jiri Kosina 提交于 9月 25, 2018

commit dbfe2953 upstream

Currently, IBPB is only issued in cases when switching into a non-dumpable
process, the rationale being to protect such 'important and security
sensitive' processess (such as GPG) from data leaking into a different
userspace process via spectre v2.

This is however completely insufficient to provide proper userspace-to-userpace
spectrev2 protection, as any process can poison branch buffers before being
scheduled out, and the newly scheduled process immediately becomes spectrev2
victim.

In order to minimize the performance impact (for usecases that do require
spectrev2 protection), issue the barrier only in cases when switching between
processess where the victim can't be ptraced by the potential attacker (as in
such cases, the attacker doesn't have to bother with branch buffers at all).

[ tglx: Split up PTRACE_MODE_NOACCESS_CHK into PTRACE_MODE_SCHED and
  PTRACE_MODE_IBPB to be able to do ptrace() context tracking reasonably
  fine-grained ]

Fixes: 18bf3c3e ("x86/speculation: Use Indirect Branch Prediction Barrier in context switch")
Originally-by: NTim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251437340.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cacd9385

x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · b07fc04c

由 Jiri Kosina 提交于 9月 25, 2018

commit 53c613fe upstream

STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.

Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b07fc04c

01 12月, 2018 3 次提交

rcu: Make need_resched() respond to urgent RCU-QS needs · 016a8fc5

由 Paul E. McKenney 提交于 7月 09, 2018

commit 92aa39e9 upstream.

The per-CPU rcu_dynticks.rcu_urgent_qs variable communicates an urgent
need for an RCU quiescent state from the force-quiescent-state processing
within the grace-period kthread to context switches and to cond_resched().
Unfortunately, such urgent needs are not communicated to need_resched(),
which is sometimes used to decide when to invoke cond_resched(), for
but one example, within the KVM vcpu_run() function. As of v4.15, this
can result in synchronize_sched() being delayed by up to ten seconds,
which can be problematic, to say nothing of annoying.

This commit therefore checks rcu_dynticks.rcu_urgent_qs from within
rcu_check_callbacks(), which is invoked from the scheduling-clock
interrupt handler. If the current task is not an idle task and is
not executing in usermode, a context switch is forced, and either way,
the rcu_dynticks.rcu_urgent_qs variable is set to false. If the current
task is an idle task, then RCU's dyntick-idle code will detect the
quiescent state, so no further action is required. Similarly, if the
task is executing in usermode, other code in rcu_check_callbacks() and
its called functions will report the corresponding quiescent state.
Reported-by: NMarius Hillenbrand <mhillenb@amazon.de>
Reported-by: NDavid Woodhouse <dwmw2@infradead.org>
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Backported to make patch apply cleanly on older versions. ]
Tested-by: NMarius Hillenbrand <mhillenb@amazon.de>
Cc: <stable@vger.kernel.org> # 4.12.x - 4.19.x
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

016a8fc5

kdb: Use strscpy with destination buffer size · 2bc40f89

由 Prarit Bhargava 提交于 9月 20, 2018

[ Upstream commit c2b94c72 ]

gcc 8.1.0 warns with:

kernel/debug/kdb/kdb_support.c: In function ‘kallsyms_symbol_next’:
kernel/debug/kdb/kdb_support.c:239:4: warning: ‘strncpy’ specified bound depends on the length of the source argument [-Wstringop-overflow=]
     strncpy(prefix_name, name, strlen(name)+1);
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/debug/kdb/kdb_support.c:239:31: note: length computed here

Use strscpy() with the destination buffer size, and use ellipses when
displaying truncated symbols.

v2: Use strscpy()
Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
Cc: Jonathan Toppins <jtoppins@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: kgdb-bugreport@lists.sourceforge.net
Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

2bc40f89

sched/fair: Fix cpu_util_wake() for 'execl' type workloads · 08fbd4e0

由 Patrick Bellasi 提交于 11月 05, 2018

[ Upstream commit c469933e ]

A ~10% regression has been reported for UnixBench's execl throughput
test by Aaron Lu and Ye Xiaolong:

  https://lkml.org/lkml/2018/10/30/765

That test is pretty simple, it does a "recursive" execve() syscall on the
same binary. Starting from the syscall, this sequence is possible:

   do_execve()
     do_execveat_common()
       __do_execve_file()
         sched_exec()
           select_task_rq_fair()          <==| Task already enqueued
             find_idlest_cpu()
               find_idlest_group()
                 capacity_spare_wake()    <==| Functions not called from
		   cpu_util_wake()           | the wakeup path

which means we can end up calling cpu_util_wake() not only from the
"wakeup path", as its name would suggest. Indeed, the task doing an
execve() syscall is already enqueued on the CPU we want to get the
cpu_util_wake() for.

The estimated utilization for a CPU computed in cpu_util_wake() was
written under the assumption that function can be called only from the
wakeup path. If instead the task is already enqueued, we end up with a
utilization which does not remove the current task's contribution from
the estimated utilization of the CPU.
This will wrongly assume a reduced spare capacity on the current CPU and
increase the chances to migrate the task on execve.

The regression is tracked down to:

 commit d519329f ("sched/fair: Update util_est only on util_avg updates")

because in that patch we turn on by default the UTIL_EST sched feature.
However, the real issue is introduced by:

 commit f9be3e59 ("sched/fair: Use util_est in LB and WU paths")

Let's fix this by ensuring to always discount the task estimated
utilization from the CPU's estimated utilization when the task is also
the current one. The same benchmark of the bug report, executed on a
dual socket 40 CPUs Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz machine,
reports these "Execl Throughput" figures (higher the better):

   mainline     : 48136.5 lps
   mainline+fix : 55376.5 lps

which correspond to a 15% speedup.

Moreover, since {cpu_util,capacity_spare}_wake() are not really only
used from the wakeup path, let's remove this ambiguity by using a better
matching name: {cpu_util,capacity_spare}_without().

Since we are at that, let's also improve the existing documentation.
Reported-by: NAaron Lu <aaron.lu@intel.com>
Reported-by: NYe Xiaolong <xiaolong.ye@intel.com>
Tested-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Todd Kjos <tkjos@google.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Fixes: f9be3e59 (sched/fair: Use util_est in LB and WU paths)
Link: https://lore.kernel.org/lkml/20181025093100.GB13236@e110439-lin/Signed-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

08fbd4e0

27 11月, 2018 2 次提交

sched/core: Take the hotplug lock in sched_init_smp() · b1e814e4

由 Valentin Schneider 提交于 10月 23, 2018

[ Upstream commit 40fa3780 ]

When running on linux-next (8c60c36d0b8c ("Add linux-next specific files
for 20181019")) + CONFIG_PROVE_LOCKING=y on a big.LITTLE system (e.g.
Juno or HiKey960), we get the following report:

 [    0.748225] Call trace:
 [    0.750685]  lockdep_assert_cpus_held+0x30/0x40
 [    0.755236]  static_key_enable_cpuslocked+0x20/0xc8
 [    0.760137]  build_sched_domains+0x1034/0x1108
 [    0.764601]  sched_init_domains+0x68/0x90
 [    0.768628]  sched_init_smp+0x30/0x80
 [    0.772309]  kernel_init_freeable+0x278/0x51c
 [    0.776685]  kernel_init+0x10/0x108
 [    0.780190]  ret_from_fork+0x10/0x18

The static_key in question is 'sched_asym_cpucapacity' introduced by
commit:

  df054e84 ("sched/topology: Add static_key for asymmetric CPU capacity optimizations")

In this particular case, we enable it because smp_prepare_cpus() will
end up fetching the capacity-dmips-mhz entry from the devicetree,
so we already have some asymmetry detected when entering sched_init_smp().

This didn't get detected in tip/sched/core because we were missing:

  commit cb538267 ("jump_label/lockdep: Assert we hold the hotplug lock for _cpuslocked() operations")

Calls to build_sched_domains() post sched_init_smp() will hold the
hotplug lock, it just so happens that this very first call is a
special case. As stated by a comment in sched_init_smp(), "There's no
userspace yet to cause hotplug operations" so this is a harmless
warning.

However, to both respect the semantics of underlying
callees and make lockdep happy, take the hotplug lock in
sched_init_smp(). This also satisfies the comment atop
sched_init_domains() that says "Callers must hold the hotplug lock".
Reported-by: NSudeep Holla <sudeep.holla@arm.com>
Tested-by: NSudeep Holla <sudeep.holla@arm.com>
Signed-off-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dietmar.Eggemann@arm.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: quentin.perret@arm.com
Link: http://lkml.kernel.org/r/1540301851-3048-1-git-send-email-valentin.schneider@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

b1e814e4

bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv · 1a7ccf42

由 Daniel Borkmann 提交于 11月 02, 2018

[ Upstream commit 28c2fae7 ]

While dbecd738 ("bpf: get kernel symbol addresses via syscall")
zeroed info.nr_jited_ksyms in bpf_prog_get_info_by_fd() for queries
from unprivileged users, commit 815581c1 ("bpf: get JITed image
lengths of functions via syscall") forgot about doing so and therefore
returns the #elems of the user set up buffer which is incorrect. It
also needs to indicate a info.nr_jited_func_lens of zero.

Fixes: 815581c1 ("bpf: get JITed image lengths of functions via syscall")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Sandipan Das <sandipan@linux.vnet.ibm.com>
Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>

1a7ccf42

23 11月, 2018 1 次提交

Revert "x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation" · 6b188783

由 Greg Kroah-Hartman 提交于 11月 21, 2018

This reverts commit 233b9d7d which is
commit 53c613fe6349994f023245519265999eed75957f upstream.

It's not ready for the stable trees as there are major slowdowns
involved with this patch.
Reported-by: NJiri Kosina <jkosina@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6b188783

21 11月, 2018 3 次提交

kdb: print real address of pointers instead of hashed addresses · 401182ae

由 Christophe Leroy 提交于 9月 27, 2018

commit 568fb6f4 upstream.

Since commit ad67b74d ("printk: hash addresses printed with %p"),
all pointers printed with %p are printed with hashed addresses
instead of real addresses in order to avoid leaking addresses in
dmesg and syslog. But this applies to kdb too, with is unfortunate:

    Entering kdb (current=0x(ptrval), pid 329) due to Keyboard Entry
    kdb> ps
    15 sleeping system daemon (state M) processes suppressed,
    use 'ps A' to see all.
    Task Addr       Pid   Parent [*] cpu State Thread     Command
    0x(ptrval)      329      328  1    0   R  0x(ptrval) *sh

    0x(ptrval)        1        0  0    0   S  0x(ptrval)  init
    0x(ptrval)        3        2  0    0   D  0x(ptrval)  rcu_gp
    0x(ptrval)        4        2  0    0   D  0x(ptrval)  rcu_par_gp
    0x(ptrval)        5        2  0    0   D  0x(ptrval)  kworker/0:0
    0x(ptrval)        6        2  0    0   D  0x(ptrval)  kworker/0:0H
    0x(ptrval)        7        2  0    0   D  0x(ptrval)  kworker/u2:0
    0x(ptrval)        8        2  0    0   D  0x(ptrval)  mm_percpu_wq
    0x(ptrval)       10        2  0    0   D  0x(ptrval)  rcu_preempt

The whole purpose of kdb is to debug, and for debugging real addresses
need to be known. In addition, data displayed by kdb doesn't go into
dmesg.

This patch replaces all %p by %px in kdb in order to display real
addresses.

Fixes: ad67b74d ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

401182ae

kdb: use correct pointer when 'btc' calls 'btt' · 47052af2

由 Christophe Leroy 提交于 9月 27, 2018

commit dded2e15 upstream.

On a powerpc 8xx, 'btc' fails as follows:

Entering kdb (current=0x(ptrval), pid 282) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0x0

when booting the kernel with 'debug_boot_weak_hash', it fails as well

Entering kdb (current=0xba99ad80, pid 284) due to Keyboard Entry
kdb> btc
btc: cpu status: Currently on cpu 0
Available cpus: 0
kdb_getarea: Bad address 0xba99ad80

On other platforms, Oopses have been observed too, see
https://github.com/linuxppc/linux/issues/139

This is due to btc calling 'btt' with %p pointer as an argument.

This patch replaces %p by %px to get the real pointer value as
expected by 'btt'

Fixes: ad67b74d ("printk: hash addresses printed with %p")
Cc: <stable@vger.kernel.org>
Signed-off-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Reviewed-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NDaniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

47052af2

tracing/kprobes: Check the probe on unloaded module correctly · 784c2eb3

由 Masami Hiramatsu 提交于 8月 29, 2018

[ Upstream commit 59158ec4 ]

Current kprobe event doesn't checks correctly whether the
given event is on unloaded module or not. It just checks
the event has ":" in the name.

That is not enough because if we define a probe on non-exist
symbol on loaded module, it allows to define that (with
warning message)

To ensure it correctly, this searches the module name on
loaded module list and only if there is not, it allows to
define it. (this event will be available when the target
module is loaded)

Link: http://lkml.kernel.org/r/153547309528.26502.8300278470528281328.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

784c2eb3

14 11月, 2018 15 次提交

bpf: wait for running BPF programs when updating map-in-map · 34d96152

由 Daniel Colascione 提交于 10月 12, 2018

commit 1ae80cf31938c8f77c37a29bbe29e7f1cd492be8 upstream.

The map-in-map frequently serves as a mechanism for atomic
snapshotting of state that a BPF program might record.  The current
implementation is dangerous to use in this way, however, since
userspace has no way of knowing when all programs that might have
retrieved the "old" value of the map may have completed.

This change ensures that map update operations on map-in-map map types
always wait for all references to the old map to drop before returning
to userspace.
Signed-off-by: NDaniel Colascione <dancol@google.com>
Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NChenbo Feng <fengc@google.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

34d96152

userns: also map extents in the reverse map to kernel IDs · 9a7a80fb

由 Jann Horn 提交于 11月 05, 2018

commit d2f007dbe7e4c9583eea6eb04d60001e85c6f1bd upstream.

The current logic first clones the extent array and sorts both copies, then
maps the lower IDs of the forward mapping into the lower namespace, but
doesn't map the lower IDs of the reverse mapping.

This means that code in a nested user namespace with >5 extents will see
incorrect IDs. It also breaks some access checks, like
inode_owner_or_capable() and privileged_wrt_inode_uidgid(), so a process
can incorrectly appear to be capable relative to an inode.

To fix it, we have to make sure that the "lower_first" members of extents
in both arrays are translated; and we have to make sure that the reverse
map is sorted *after* the translation (since otherwise the translation can
break the sorting).

This is CVE-2018-18955.

Fixes: 6397fac4 ("userns: bump idmap limits to 340")
Cc: stable@vger.kernel.org
Signed-off-by: NJann Horn <jannh@google.com>
Tested-by: NEric W. Biederman <ebiederm@xmission.com>
Reviewed-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

9a7a80fb

tracing: Return -ENOENT if there is no target synthetic event · 2b3171ec

由 Masami Hiramatsu 提交于 10月 22, 2018

commit 18858511 upstream.

Return -ENOENT error if there is no target synthetic event.
This notices an operation failure to user as below;

  # echo 'wakeup_latency u64 lat; pid_t pid;' > synthetic_events
  # echo '!wakeup' >> synthetic_events
  sh: write error: No such file or directory

Link: http://lkml.kernel.org/r/154013449986.25576.9487131386597290172.stgit@devboxAcked-by: NTom Zanussi <zanussi@linux.intel.com>
Tested-by: NTom Zanussi <zanussi@linux.intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Rajvi Jingar <rajvi.jingar@intel.com>
Cc: stable@vger.kernel.org
Fixes: 4b147936 ('tracing: Add support for 'synthetic' events')
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

2b3171ec

genirq: Fix race on spurious interrupt detection · e6d2f788

由 Lukas Wunner 提交于 10月 18, 2018

commit 746a923b863a1065ef77324e1e43f19b1a3eab5c upstream.

Commit 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of
threaded irqs") made detection of spurious interrupts work for threaded
handlers by:

a) incrementing a counter every time the thread returns IRQ_HANDLED, and
b) checking whether that counter has increased every time the thread is
   woken.

However for oneshot interrupts, the commit unmasks the interrupt before
incrementing the counter.  If another interrupt occurs right after
unmasking but before the counter is incremented, that interrupt is
incorrectly considered spurious:

time
 |  irq_thread()
 |    irq_thread_fn()
 |      action->thread_fn()
 |      irq_finalize_oneshot()
 |        unmask_threaded_irq()            /* interrupt is unmasked */
 |
 |                  /* interrupt fires, incorrectly deemed spurious */
 |
 |    atomic_inc(&desc->threads_handled); /* counter is incremented */
 v

This is observed with a hi3110 CAN controller receiving data at high volume
(from a separate machine sending with "cangen -g 0 -i -x"): The controller
signals a huge number of interrupts (hundreds of millions per day) and
every second there are about a dozen which are deemed spurious.

In theory with high CPU load and the presence of higher priority tasks, the
number of incorrectly detected spurious interrupts might increase beyond
the 99,900 threshold and cause disablement of the interrupt.

In practice it just increments the spurious interrupt count. But that can
cause people to waste time investigating it over and over.

Fix it by moving the accounting before the invocation of
irq_finalize_oneshot().

[ tglx: Folded change log update ]

Fixes: 1e77d0a1 ("genirq: Sanitize spurious interrupt detection of threaded irqs")
Signed-off-by: NLukas Wunner <lukas@wunner.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Mathias Duckeck <m.duckeck@kunbus.de>
Cc: Akshay Bhat <akshay.bhat@timesys.com>
Cc: Casey Fitzpatrick <casey.fitzpatrick@timesys.com>
Cc: stable@vger.kernel.org # v3.16+
Link: https://lkml.kernel.org/r/1dfd8bbd16163940648045495e3e9698e63b50ad.1539867047.git.lukas@wunner.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e6d2f788

printk: Fix panic caused by passing log_buf_len to command line · e945325e

由 He Zhe 提交于 9月 30, 2018

commit 277fcdb2 upstream.

log_buf_len_setup does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "log_buf_len",
without its value, is set in command line and thus causes the following
panic.

PANIC: early exception 0xe3 IP 10:ffffffffaaeacd0d error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc4-yocto-standard+ #1
[    0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
...
[    0.000000] Call Trace:
[    0.000000]  simple_strtoull+0x29/0x70
[    0.000000]  memparse+0x26/0x90
[    0.000000]  log_buf_len_setup+0x17/0x22
[    0.000000]  do_early_param+0x57/0x8e
[    0.000000]  parse_args+0x208/0x320
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_options+0x29/0x2d
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_param+0x36/0x4d
[    0.000000]  setup_arch+0x336/0x99e
[    0.000000]  start_kernel+0x6f/0x4ee
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x6f/0x72
[    0.000000]  secondary_startup_64+0xa4/0xb0

This patch adds a check to prevent the panic.

Link: http://lkml.kernel.org/r/1538239553-81805-1-git-send-email-zhe.he@windriver.com
Cc: stable@vger.kernel.org
Cc: rostedt@goodmis.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NHe Zhe <zhe.he@windriver.com>
Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e945325e

kbuild: fix kernel/bounds.c 'W=1' warning · 1e36467f

由 Arnd Bergmann 提交于 10月 30, 2018

commit 6a32c246 upstream.

Building any configuration with 'make W=1' produces a warning:

kernel/bounds.c:16:6: warning: no previous prototype for 'foo' [-Wmissing-prototypes]

When also passing -Werror, this prevents us from building any other files.
Nobody ever calls the function, but we can't make it 'static' either
since we want the compiler output.

Calling it 'main' instead however avoids the warning, because gcc
does not insist on having a declaration for main.

Link: http://lkml.kernel.org/r/20181005083313.2088252-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
Reported-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Reviewed-by: NKieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

1e36467f

signal: Guard against negative signal numbers in copy_siginfo_from_user32 · 04eb7194

由 Eric W. Biederman 提交于 10月 10, 2018

commit a3670058 upstream.

While fixing an out of bounds array access in known_siginfo_layout
reported by the kernel test robot it became apparent that the same bug
exists in siginfo_layout and affects copy_siginfo_from_user32.

The straight forward fix that makes guards against making this mistake
in the future and should keep the code size small is to just take an
unsigned signal number instead of a signed signal number, as I did to
fix known_siginfo_layout.

Cc: stable@vger.kernel.org
Fixes: cc731525 ("signal: Remove kernel interal si_code magic")
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

04eb7194

signal: Always deliver the kernel's SIGKILL and SIGSTOP to a pid namespace init · cd085208

由 Eric W. Biederman 提交于 9月 03, 2018

[ Upstream commit 3597dfe0 ]

Instead of playing whack-a-mole and changing SEND_SIG_PRIV to
SEND_SIG_FORCED throughout the kernel to ensure a pid namespace init
gets signals sent by the kernel, stop allowing a pid namespace init to
ignore SIGKILL or SIGSTOP sent by the kernel.  A pid namespace init is
only supposed to be able to ignore signals sent from itself and
children with SIG_DFL.

Fixes: 921cf9f6 ("signals: protect cinit from unblocked SIG_DFL signals")
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

cd085208

bpf/verifier: fix verifier instability · e12da427

由 Alexei Starovoitov 提交于 9月 04, 2018

[ Upstream commit a9c676bc ]

Edward Cree says:
In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
has supplied a reg_type, the other members of the register state are set
appropriately.  Previously reg.range was set to 0, but as it is in a
union with reg.map_ptr, which is larger, upper bytes of the latter were
left in place.  This then caused the memcmp() in regsafe() to fail,
preventing some branches from being pruned (and occasionally causing the
same program to take a varying number of processed insns on repeated
verifier runs).

Fix the instability by clearing bpf_reg_state in __mark_reg_[un]known()

Fixes: f1174f77 ("bpf/verifier: rework value tracking")
Debugged-by: NEdward Cree <ecree@solarflare.com>
Acked-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

e12da427

kprobes: Return error if we fail to reuse kprobe instead of BUG_ON() · 82b0f70f

由 Masami Hiramatsu 提交于 9月 11, 2018

[ Upstream commit 819319fc93461c07b9cdb3064f154bd8cfd48172 ]

Make reuse_unused_kprobe() to return error code if
it fails to reuse unused kprobe for optprobe instead
of calling BUG_ON().
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S . Miller <davem@davemloft.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Naveen N . Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/153666124040.21306.14150398706331307654.stgit@devboxSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

82b0f70f

signal: Introduce COMPAT_SIGMINSTKSZ for use in compat_sys_sigaltstack · 5445a4b0

由 Will Deacon 提交于 9月 05, 2018

[ Upstream commit 22839869f21ab3850fbbac9b425ccc4c0023926f ]

The sigaltstack(2) system call fails with -ENOMEM if the new alternative
signal stack is found to be smaller than SIGMINSTKSZ. On architectures
such as arm64, where the native value for SIGMINSTKSZ is larger than
the compat value, this can result in an unexpected error being reported
to a compat task. See, for example:

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=904385

This patch fixes the problem by extending do_sigaltstack to take the
minimum signal stack size as an additional parameter, allowing the
native and compat system call entry code to pass in their respective
values. COMPAT_SIGMINSTKSZ is just defined as SIGMINSTKSZ if it has not
been defined by the architecture.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: NSteve McIntyre <steve.mcintyre@arm.com>
Tested-by: NSteve McIntyre <93sam@debian.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5445a4b0

locking/lockdep: Fix debug_locks off performance problem · 117d5fbd

由 Waiman Long 提交于 10月 18, 2018

[ Upstream commit 9506a7425b094d2f1d9c877ed5a78f416669269b ]

It was found that when debug_locks was turned off because of a problem
found by the lockdep code, the system performance could drop quite
significantly when the lock_stat code was also configured into the
kernel. For instance, parallel kernel build time on a 4-socket x86-64
server nearly doubled.

Further analysis into the cause of the slowdown traced back to the
frequent call to debug_locks_off() from the __lock_acquired() function
probably due to some inconsistent lockdep states with debug_locks
off. The debug_locks_off() function did an unconditional atomic xchg
to write a 0 value into debug_locks which had already been set to 0.
This led to severe cacheline contention in the cacheline that held
debug_locks.  As debug_locks is being referenced in quite a few different
places in the kernel, this greatly slow down the system performance.

To prevent that trashing of debug_locks cacheline, lock_acquired()
and lock_contended() now checks the state of debug_locks before
proceeding. The debug_locks_off() function is also modified to check
debug_locks before calling __debug_locks_off().
Signed-off-by: NWaiman Long <longman@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1539913518-15598-1-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

117d5fbd

x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation · 233b9d7d

由 Jiri Kosina 提交于 9月 25, 2018

commit 53c613fe6349994f023245519265999eed75957f upstream.

STIBP is a feature provided by certain Intel ucodes / CPUs. This feature
(once enabled) prevents cross-hyperthread control of decisions made by
indirect branch predictors.

Enable this feature if

- the CPU is vulnerable to spectre v2
- the CPU supports SMT and has SMT siblings online
- spectre_v2 mitigation autoselection is enabled (default)

After some previous discussion, this leaves STIBP on all the time, as wrmsr
on crossing kernel boundary is a no-no. This could perhaps later be a bit
more optimized (like disabling it in NOHZ, experiment with disabling it in
idle, etc) if needed.

Note that the synchronization of the mask manipulation via newly added
spec_ctrl_mutex is currently not strictly needed, as the only updater is
already being serialized by cpu_add_remove_lock, but let's make this a
little bit more future-proof.
Signed-off-by: NJiri Kosina <jkosina@suse.cz>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc:  "WoodhouseDavid" <dwmw@amazon.co.uk>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc:  "SchauflerCasey" <casey.schaufler@intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/nycvar.YFH.7.76.1809251438240.15880@cbobk.fhfr.pmSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

233b9d7d

dma-mapping: fix panic caused by passing empty cma command line argument · 953983cb

由 He Zhe 提交于 9月 17, 2018

commit a3ceed87b07769fb80ce9dc6b604e515dba14c4b upstream.

early_cma does not check input argument before passing it to
simple_strtoull. The argument would be a NULL pointer if "cma", without
its value, is set in command line and thus causes the following panic.

PANIC: early exception 0xe3 IP 10:ffffffffa3e9db8d error 0 cr2 0x0
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.19.0-rc3-yocto-standard+ #7
[    0.000000] RIP: 0010:_parse_integer_fixup_radix+0xd/0x70
...
[    0.000000] Call Trace:
[    0.000000]  simple_strtoull+0x29/0x70
[    0.000000]  memparse+0x26/0x90
[    0.000000]  early_cma+0x17/0x6a
[    0.000000]  do_early_param+0x57/0x8e
[    0.000000]  parse_args+0x208/0x320
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_options+0x29/0x2d
[    0.000000]  ? rdinit_setup+0x30/0x30
[    0.000000]  parse_early_param+0x36/0x4d
[    0.000000]  setup_arch+0x336/0x99e
[    0.000000]  start_kernel+0x6f/0x4e6
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x6f/0x72
[    0.000000]  secondary_startup_64+0xa4/0xb0

This patch adds a check to prevent the panic.
Signed-off-by: NHe Zhe <zhe.he@windriver.com>
Reviewed-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

953983cb

bpf: fix partial copy of map_ptr when dst is scalar · d93080cd

由 Daniel Borkmann 提交于 11月 01, 2018

commit 0962590e553331db2cc0aef2dc35c57f6300dbbe upstream.

ALU operations on pointers such as scalar_reg += map_value_ptr are
handled in adjust_ptr_min_max_vals(). Problem is however that map_ptr
and range in the register state share a union, so transferring state
through dst_reg->range = ptr_reg->range is just buggy as any new
map_ptr in the dst_reg is then truncated (or null) for subsequent
checks. Fix this by adding a raw member and use it for copying state
over to dst_reg.

Fixes: f1174f77 ("bpf/verifier: rework value tracking")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Edward Cree <ecree@solarflare.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NEdward Cree <ecree@solarflare.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>

d93080cd

20 10月, 2018 2 次提交

tracing: Fix synthetic event to allow semicolon at end · a360d9e4

由 Masami Hiramatsu 提交于 10月 18, 2018

Fix synthetic event to allow independent semicolon at end.

The synthetic_events interface accepts a semicolon after the
last word if there is no space.

 # echo "myevent u64 var;" >> synthetic_events

But if there is a space, it returns an error.

 # echo "myevent u64 var ;" > synthetic_events
 sh: write error: Invalid argument

This behavior is difficult for users to understand. Let's
allow the last independent semicolon too.

Link: http://lkml.kernel.org/r/153986835420.18251.2191216690677025744.stgit@devbox

Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936 ("tracing: Add support for 'synthetic' events")
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

a360d9e4

tracing: Fix synthetic event to accept unsigned modifier · 282447ba

由 Masami Hiramatsu 提交于 10月 18, 2018

Fix synthetic event to accept unsigned modifier for its field type
correctly.

Currently, synthetic_events interface returns error for "unsigned"
modifiers as below;

 # echo "myevent unsigned long var" >> synthetic_events
 sh: write error: Invalid argument

This is because argv_split() breaks "unsigned long" into "unsigned"
and "long", but parse_synth_field() doesn't expected it.

With this fix, synthetic_events can handle the "unsigned long"
correctly like as below;

 # echo "myevent unsigned long var" >> synthetic_events
 # cat synthetic_events
 myevent	unsigned long var

Link: http://lkml.kernel.org/r/153986832571.18251.8448135724590496531.stgit@devbox

Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: stable@vger.kernel.org
Fixes: commit 4b147936 ("tracing: Add support for 'synthetic' events")
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

282447ba

18 10月, 2018 2 次提交

tracing: Use trace_clock_local() for looping in preemptirq_delay_test.c · 12ad0cb2

由 Steven Rostedt (VMware) 提交于 10月 15, 2018

The preemptirq_delay_test module is used for the ftrace selftest code that
tests the latency tracers. The problem is that it uses ktime for the delay
loop, and then checks the tracer to see if the delay loop is caught, but the
tracer uses trace_clock_local() which uses various different other clocks to
measure the latency. As ktime uses the clock cycles, and the code then
converts that to nanoseconds, it causes rounding errors, and the preemptirq
latency tests are failing due to being off by 1 (it expects to see a delay
of 500000 us, but the delay is only 499999 us). This is happening due to a
rounding error in the ktime (which is totally legit). The purpose of the
test is to see if it can catch the delay, not to test the accuracy between
trace_clock_local() and ktime_get(). Best to use apples to apples, and have
the delay loop use the same clock as the latency tracer does.

Cc: stable@vger.kernel.org
Fixes: f96e8577 ("lib: Add module for testing preemptoff/irqsoff latency tracers")
Acked-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

12ad0cb2

tracepoint: Fix tracepoint array element size mismatch · 9c0be3f6

由 Mathieu Desnoyers 提交于 10月 13, 2018

commit 46e0c9be ("kernel: tracepoints: add support for relative
references") changes the layout of the __tracepoint_ptrs section on
architectures supporting relative references. However, it does so
without turning struct tracepoint * const into const int elsewhere in
the tracepoint code, which has the following side-effect:

Setting mod->num_tracepoints is done in by module.c:

    mod->tracepoints_ptrs = section_objs(info, "__tracepoints_ptrs",
                                         sizeof(*mod->tracepoints_ptrs),
                                         &mod->num_tracepoints);

Basically, since sizeof(*mod->tracepoints_ptrs) is a pointer size
(rather than sizeof(int)), num_tracepoints is erroneously set to half the
size it should be on 64-bit arch. So a module with an odd number of
tracepoints misses the last tracepoint due to effect of integer
division.

So in the module going notifier:

        for_each_tracepoint_range(mod->tracepoints_ptrs,
                mod->tracepoints_ptrs + mod->num_tracepoints,
                tp_module_going_check_quiescent, NULL);

the expression (mod->tracepoints_ptrs + mod->num_tracepoints) actually
evaluates to something within the bounds of the array, but miss the
last tracepoint if the number of tracepoints is odd on 64-bit arch.

Fix this by introducing a new typedef: tracepoint_ptr_t, which
is either "const int" on architectures that have PREL32 relocations,
or "struct tracepoint * const" on architectures that does not have
this feature.

Also provide a new tracepoint_ptr_defer() static inline to
encapsulate deferencing this type rather than duplicate code and
ugly idefs within the for_each_tracepoint_range() implementation.

This issue appears in 4.19-rc kernels, and should ideally be fixed
before the end of the rc cycle.
Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: NJessica Yu <jeyu@kernel.org>
Link: http://lkml.kernel.org/r/20181013191050.22389-1-mathieu.desnoyers@efficios.com
Link: http://lkml.kernel.org/r/20180704083651.24360-7-ard.biesheuvel@linaro.org
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morris <james.morris@microsoft.com>
Cc: James Morris <jmorris@namei.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Thomas Garnier <thgarnie@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

9c0be3f6

16 10月, 2018 1 次提交

sched/fair: Fix the min_vruntime update logic in dequeue_entity() · 9845c49c

由 Song Muchun 提交于 10月 14, 2018

The comment and the code around the update_min_vruntime() call in
dequeue_entity() are not in agreement.

From commit:

  b60205c7 ("sched/fair: Fix min_vruntime tracking")

I think that we want to update min_vruntime when a task is sleeping/migrating.
So, the check is inverted there - fix it.
Signed-off-by: NSong Muchun <smuchun@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: b60205c7 ("sched/fair: Fix min_vruntime tracking")
Link: http://lkml.kernel.org/r/20181014112612.2614-1-smuchun@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9845c49c

11 10月, 2018 2 次提交

sched/fair: Fix throttle_list starvation with low CFS quota · baa9be4f

由 Phil Auld 提交于 10月 08, 2018

With a very low cpu.cfs_quota_us setting, such as the minimum of 1000,
distribute_cfs_runtime may not empty the throttled_list before it runs
out of runtime to distribute. In that case, due to the change from
c06f04c7 to put throttled entries at the head of the list, later entries
on the list will starve.  Essentially, the same X processes will get pulled
off the list, given CPU time and then, when expired, get put back on the
head of the list where distribute_cfs_runtime will give runtime to the same
set of processes leaving the rest.

Fix the issue by setting a bit in struct cfs_bandwidth when
distribute_cfs_runtime is running, so that the code in throttle_cfs_rq can
decide to put the throttled entry on the tail or the head of the list.  The
bit is set/cleared by the callers of distribute_cfs_runtime while they hold
cfs_bandwidth->lock.

This is easy to reproduce with a handful of CPU consumers. I use 'crash' on
the live system. In some cases you can simply look at the throttled list and
see the later entries are not changing:

  crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
    1     ffff90b56cb2d200  -976050
    2     ffff90b56cb2cc00  -484925
    3     ffff90b56cb2bc00  -658814
    4     ffff90b56cb2ba00  -275365
    5     ffff90b166a45600  -135138
    6     ffff90b56cb2da00  -282505
    7     ffff90b56cb2e000  -148065
    8     ffff90b56cb2fa00  -872591
    9     ffff90b56cb2c000  -84687
   10     ffff90b56cb2f000  -87237
   11     ffff90b166a40a00  -164582

  crash> list cfs_rq.throttled_list -H 0xffff90b54f6ade40 -s cfs_rq.runtime_remaining | paste - - | awk '{print $1"  "$4}' | pr -t -n3
    1     ffff90b56cb2d200  -994147
    2     ffff90b56cb2cc00  -306051
    3     ffff90b56cb2bc00  -961321
    4     ffff90b56cb2ba00  -24490
    5     ffff90b166a45600  -135138
    6     ffff90b56cb2da00  -282505
    7     ffff90b56cb2e000  -148065
    8     ffff90b56cb2fa00  -872591
    9     ffff90b56cb2c000  -84687
   10     ffff90b56cb2f000  -87237
   11     ffff90b166a40a00  -164582

Sometimes it is easier to see by finding a process getting starved and looking
at the sched_info:

  crash> task ffff8eb765994500 sched_info
  PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
    sched_info = {
      pcount = 8,
      run_delay = 697094208,
      last_arrival = 240260125039,
      last_queued = 240260327513
    },
  crash> task ffff8eb765994500 sched_info
  PID: 7800   TASK: ffff8eb765994500  CPU: 16  COMMAND: "cputest"
    sched_info = {
      pcount = 8,
      run_delay = 697094208,
      last_arrival = 240260125039,
      last_queued = 240260327513
    },
Signed-off-by: NPhil Auld <pauld@redhat.com>
Reviewed-by: NBen Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop")
Link: http://lkml.kernel.org/r/20181008143639.GA4019@pauld.bos.csbSigned-off-by: NIngo Molnar <mingo@kernel.org>

baa9be4f

xsk: do not call synchronize_net() under RCU read lock · cee27167

由 Björn Töpel 提交于 10月 08, 2018

The XSKMAP update and delete functions called synchronize_net(), which
can sleep. It is not allowed to sleep during an RCU read section.

Instead we need to make sure that the sock sk_destruct (xsk_destruct)
function is asynchronously called after an RCU grace period. Setting
the SOCK_RCU_FREE flag for XDP sockets takes care of this.

Fixes: fbfc504a ("bpf: introduce new bpf AF_XDP map type BPF_MAP_TYPE_XSKMAP")
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

cee27167

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功