- 17 7月, 2018 2 次提交
-
-
由 Andrea Parri 提交于
There are 11 interpretations of the requirements described in the header comment for smp_mb__after_spinlock(): one for each LKMM maintainer, and one currently encoded in the Cat file. Stick to the latter (until a more satisfactory solution is available). This also reworks some snippets related to the barrier to illustrate the requirements and to link them to the idioms which are relied upon at its call sites. Suggested-by: NBoqun Feng <boqun.feng@gmail.com> Signed-off-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: akiyks@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Link: http://lkml.kernel.org/r/20180716180605.16115-11-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andrea Parri 提交于
wake_woken_function() synchronizes with wait_woken() as follows: [wait_woken] [wake_woken_function] entry->flags &= ~wq_flag_woken; condition = true; smp_mb(); smp_wmb(); if (condition) wq_entry->flags |= wq_flag_woken; break; This commit replaces the above smp_wmb() with an smp_mb() in order to guarantee that either wait_woken() sees the wait condition being true or the store to wq_entry->flags in woken_wake_function() follows the store in wait_woken() in the coherence order (so that the former can eventually be observed by wait_woken()). The commit also fixes a comment associated to set_current_state() in wait_woken(): the comment pairs the barrier in set_current_state() to the above smp_wmb(), while the actual pairing involves the barrier in set_current_state() and the barrier executed by the try_to_wake_up() in wake_woken_function(). Signed-off-by: NAndrea Parri <andrea.parri@amarulasolutions.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akiyks@gmail.com Cc: boqun.feng@gmail.com Cc: dhowells@redhat.com Cc: j.alglave@ucl.ac.uk Cc: linux-arch@vger.kernel.org Cc: luc.maranget@inria.fr Cc: npiggin@gmail.com Cc: parri.andrea@gmail.com Cc: stern@rowland.harvard.edu Cc: will.deacon@arm.com Link: http://lkml.kernel.org/r/20180716180605.16115-10-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 13 7月, 2018 1 次提交
-
-
由 Joel Fernandes (Google) 提交于
Currently ftrace displays data in trace output like so: _-----=> irqs-off / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID CPU TGID |||| TIMESTAMP FUNCTION | | | | |||| | | bash-1091 [000] ( 1091) d..2 28.313544: sched_switch: However Android's trace visualization tools expect a slightly different format due to an out-of-tree patch patch that was been carried for a decade, notice that the TGID and CPU fields are reversed: _-----=> irqs-off / _----=> need-resched | / _---=> hardirq/softirq || / _--=> preempt-depth ||| / delay TASK-PID TGID CPU |||| TIMESTAMP FUNCTION | | | | |||| | | bash-1091 ( 1091) [002] d..2 64.965177: sched_switch: From kernel v4.13 onwards, during which TGID was introduced, tracing with systrace on all Android kernels will break (most Android kernels have been on 4.9 with Android patches, so this issues hasn't been seen yet). From v4.13 onwards things will break. The chrome browser's tracing tools also embed the systrace viewer which uses the legacy TGID format and updates to that are known to be difficult to make. Considering this, I suggest we make this change to the upstream kernel and backport it to all Android kernels. I believe this feature is merged recently enough into the upstream kernel that it shouldn't be a problem. Also logically, IMO it makes more sense to group the TGID with the TASK-PID and the CPU after these. Link: http://lkml.kernel.org/r/20180626000822.113931-1-joel@joelfernandes.org Cc: jreck@google.com Cc: tkjos@google.com Cc: stable@vger.kernel.org Fixes: 441dae8f ("tracing: Add support for display of tgid in trace output") Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 12 7月, 2018 1 次提交
-
-
由 Jiri Olsa 提交于
We don't release tk->tp.call.print_fmt when destroying local uprobe. Also there's missing print_fmt kfree in create_local_trace_kprobe error path. Link: http://lkml.kernel.org/r/20180709141906.2390-1-jolsa@kernel.org Cc: stable@vger.kernel.org Fixes: e12f03d7 ("perf/core: Implement the 'perf_kprobe' PMU") Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 11 7月, 2018 5 次提交
-
-
由 Mathieu Desnoyers 提交于
Declaring the rseq_cs field as a union between __u64 and two __u32 allows both 32-bit and 64-bit kernels to read the full __u64, and therefore validate that a 32-bit user-space cleared the upper 32 bits, thus ensuring a consistent behavior between native 32-bit kernels and 32-bit compat tasks on 64-bit kernels. Check that the rseq_cs value read is < TASK_SIZE. The asm/byteorder.h header needs to be included by rseq.h, now that it is not using linux/types_32_64.h anymore. Considering that only __32 and __u64 types are declared in linux/rseq.h, the linux/types.h header should always be included for both kernel and user-space code: including stdint.h is just for u64 and u32, which are not used in this header at all. Use copy_from_user()/clear_user() to interact with a 64-bit field, because arm32 does not implement 64-bit __get_user, and ppc32 does not 64-bit get_user. Considering that the rseq_cs pointer does not need to be loaded/stored with single-copy atomicity from the kernel anymore, we can simply use copy_from_user()/clear_user(). Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-api@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Watson <davejwatson@fb.com> Cc: Paul Turner <pjt@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chris Lameter <cl@linux.com> Cc: Ben Maurer <bmaurer@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180709195155.7654-5-mathieu.desnoyers@efficios.com
-
由 Mathieu Desnoyers 提交于
Update rseq uapi header comments to reflect that user-space need to do thread-local loads/stores from/to the struct rseq fields. As a consequence of this added requirement, the kernel does not need to perform loads/stores with single-copy atomicity. Update the comment associated to the "flags" fields to describe more accurately that it's only useful to facilitate single-stepping through rseq critical sections with debuggers. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-api@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Watson <davejwatson@fb.com> Cc: Paul Turner <pjt@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chris Lameter <cl@linux.com> Cc: Ben Maurer <bmaurer@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180709195155.7654-4-mathieu.desnoyers@efficios.com
-
由 Mathieu Desnoyers 提交于
__get_user()/__put_user() is used to read values for address ranges that were already checked with access_ok() on rseq registration. It has been recognized that __get_user/__put_user are optimizing the wrong thing. Replace them by get_user/put_user across rseq instead. If those end up showing up in benchmarks, the proper approach would be to use user_access_begin() / unsafe_{get,put}_user() / user_access_end() anyway. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-api@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Watson <davejwatson@fb.com> Cc: Paul Turner <pjt@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chris Lameter <cl@linux.com> Cc: Ben Maurer <bmaurer@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Cc: linux-arm-kernel@lists.infradead.org Link: https://lkml.kernel.org/r/20180709195155.7654-3-mathieu.desnoyers@efficios.com
-
由 Mathieu Desnoyers 提交于
Change the rseq ABI so rseq_cs start_ip, post_commit_offset and abort_ip fields are seen as 64-bit fields by both 32-bit and 64-bit kernels rather that ignoring the 32 upper bits on 32-bit kernels. This ensures we have a consistent behavior for a 32-bit binary executed on 32-bit kernels and in compat mode on 64-bit kernels. Validating the value of abort_ip field to be below TASK_SIZE ensures the kernel don't return to an invalid address when returning to userspace after an abort. I don't fully trust each architecture code to consistently deal with invalid return addresses. Validating the value of the start_ip and post_commit_offset fields prevents overflow on arithmetic performed on those values, used to check whether abort_ip is within the rseq critical section. If validation fails, the process is killed with a segmentation fault. When the signature encountered before abort_ip does not match the expected signature, return -EINVAL rather than -EPERM to be consistent with other input validation return codes from rseq_get_rseq_cs(). Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: linux-api@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Watson <davejwatson@fb.com> Cc: Paul Turner <pjt@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Russell King <linux@arm.linux.org.uk> Cc: "H . Peter Anvin" <hpa@zytor.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Chris Lameter <cl@linux.com> Cc: Ben Maurer <bmaurer@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Joel Fernandes <joelaf@google.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lkml.kernel.org/r/20180709195155.7654-2-mathieu.desnoyers@efficios.com
-
由 Sudeep Holla 提交于
This reverts commit 1332a905. The original issue was not because of incorrect checking of cpumask for both new and old tick device. It was incorrectly analysed was due to the misunderstanding of the comment and misinterpretation of the return value from tick_check_preferred. The main issue is with the clockevent driver that sets the cpumask to cpu_all_mask instead of cpu_possible_mask. Signed-off-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NKevin Hilman <khilman@baylibre.com> Tested-by: NMartin Blumenstingl <martin.blumenstingl@googlemail.com> Cc: linux-arm-kernel@lists.infradead.org Cc: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1531151136-18297-1-git-send-email-sudeep.holla@arm.com
-
- 04 7月, 2018 6 次提交
-
-
由 Changbin Du 提交于
The function_graph tracer does not show the interrupt return marker for the leaf entry. On leaf entries, we see an unbalanced interrupt marker (the interrupt was entered, but nevern left). Before: 1) | SyS_write() { 1) | __fdget_pos() { 1) 0.061 us | __fget_light(); 1) 0.289 us | } 1) | vfs_write() { 1) 0.049 us | rw_verify_area(); 1) + 15.424 us | __vfs_write(); 1) ==========> | 1) 6.003 us | smp_apic_timer_interrupt(); 1) 0.055 us | __fsnotify_parent(); 1) 0.073 us | fsnotify(); 1) + 23.665 us | } 1) + 24.501 us | } After: 0) | SyS_write() { 0) | __fdget_pos() { 0) 0.052 us | __fget_light(); 0) 0.328 us | } 0) | vfs_write() { 0) 0.057 us | rw_verify_area(); 0) | __vfs_write() { 0) ==========> | 0) 8.548 us | smp_apic_timer_interrupt(); 0) <========== | 0) + 36.507 us | } /* __vfs_write */ 0) 0.049 us | __fsnotify_parent(); 0) 0.066 us | fsnotify(); 0) + 50.064 us | } 0) + 50.952 us | } Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel.com Cc: stable@vger.kernel.org Fixes: f8b755ac ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return") Signed-off-by: NChangbin Du <changbin.du@intel.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Yisheng Xie 提交于
clear_ftrace_function is not used outside of ftrace.c and is not help to use a function, so nuke it per Steve's suggestion. Link: http://lkml.kernel.org/r/1517537689-34947-1-git-send-email-xieyisheng1@huawei.comSuggested-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NYisheng Xie <xieyisheng1@huawei.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Mathieu Malaterre 提交于
Silence warnings (triggered at W=1) by adding relevant __printf attributes. CC kernel/trace/trace.o kernel/trace/trace.c: In function ‘__trace_array_vprintk’: kernel/trace/trace.c:2979:2: warning: function might be possible candidate for ‘gnu_printf’ format attribute [-Wsuggest-attribute=format] len = vscnprintf(tbuffer, TRACE_BUF_SIZE, fmt, args); ^~~ AR kernel/trace/built-in.o Link: http://lkml.kernel.org/r/20180308205843.27447-1-malat@debian.orgSigned-off-by: NMathieu Malaterre <malat@debian.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 yuan linyu 提交于
Simplify and optimize the logic in trace_buffer_iter() to use a conditional operation instead of an if conditional. Link: http://lkml.kernel.org/r/20180408113631.3947-1-cugyly@163.comSigned-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The comment in create_filter() states that the passed in filter pointer (filterp) will either be NULL or contain an error message stating why the filter failed. But it also expects the filter pointer to point to NULL when passed in. If it is not, the function create_filter_start() will warn and return an error message without updating the filter pointer. This is not what the comment states. As we always expect the pointer to point to NULL, if it is not, trigger a WARN_ON(), set it to NULL, and then continue the path as the rest will work as the comment states. Also update the comment to state it must point to NULL. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Arnd Bergmann 提交于
'err' is used as a NUL-terminated string, but using strncpy() with the length equal to the buffer size may result in lack of the termination: kernel/trace/trace_events_hist.c: In function 'hist_err_event': kernel/trace/trace_events_hist.c:396:3: error: 'strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] strncpy(err, var, MAX_FILTER_STR_VAL); This changes it to use the safer strscpy() instead. Link: http://lkml.kernel.org/r/20180328140920.2842153-1-arnd@arndb.de Cc: stable@vger.kernel.org Fixes: f404da6e ("tracing: Add 'last error' error facility for hist triggers") Acked-by: NTom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 03 7月, 2018 6 次提交
-
-
由 Peter Zijlstra 提交于
Gaurav reports that commit: 85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue") isn't working for him. Because of the following race: > controller Thread CPUHP Thread > takedown_cpu > kthread_park > kthread_parkme > Set KTHREAD_SHOULD_PARK > smpboot_thread_fn > set Task interruptible > > > wake_up_process > if (!(p->state & state)) > goto out; > > Kthread_parkme > SET TASK_PARKED > schedule > raw_spin_lock(&rq->lock) > ttwu_remote > waiting for __task_rq_lock > context_switch > > finish_lock_switch > > > > Case TASK_PARKED > kthread_park_complete > > > SET Running Furthermore, Oleg noticed that the whole scheduler TASK_PARKED handling is buggered because the TASK_DEAD thing is done with preemption disabled, the current code can still complete early on preemption :/ So basically revert that earlier fix and go with a variant of the alternative mentioned in the commit. Promote TASK_PARKED to special state to avoid the store-store issue on task->state leading to the WARN in kthread_unpark() -> __kthread_bind(). But in addition, add wait_task_inactive() to kthread_park() to ensure the task really is PARKED when we return from kthread_park(). This avoids the whole kthread still gets migrated nonsense -- although it would be really good to get this done differently. Reported-by: NGaurav Kohli <gkohli@codeaurora.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 85f1abe0 ("kthread, sched/wait: Fix kthread_parkme() completion issue") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
When a cfs_rq is throttled, parent cfs_rq->nr_running is decreased and everything happens at cfs_rq level. Currently util_est stays unchanged in such case and it keeps accounting the utilization of throttled tasks. This can somewhat make sense as we don't dequeue tasks but only throttled cfs_rq. If a task of another group is enqueued/dequeued and root cfs_rq becomes idle during the dequeue, util_est will be cleared whereas it was accounting util_est of throttled tasks before. So the behavior of util_est is not always the same regarding throttled tasks and depends of side activity. Furthermore, util_est will not be updated when the cfs_rq is unthrottled as everything happens at cfs_rq level. Main results is that util_est will stay null whereas we now have running tasks. We have to wait for the next dequeue/enqueue of the previously throttled tasks to get an up to date util_est. Remove the assumption that cfs_rq's estimated utilization of a CPU is 0 if there is no running task so the util_est of a task remains until the latter is dequeued even if its cfs_rq has been throttled. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NPatrick Bellasi <patrick.bellasi@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 7f65ea42 ("sched/fair: Add util_est on top of PELT") Link: http://lkml.kernel.org/r/1528972380-16268-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Xunlei Pang 提交于
When period gets restarted after some idle time, start_cfs_bandwidth() doesn't update the expiration information, expire_cfs_rq_runtime() will see cfs_rq->runtime_expires smaller than rq clock and go to the clock drift logic, wasting needless CPU cycles on the scheduler hot path. Update the global expiration in start_cfs_bandwidth() to avoid frequent expire_cfs_rq_runtime() calls once a new period begins. Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180620101834.24455-2-xlpang@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Xunlei Pang 提交于
I noticed that cgroup task groups constantly get throttled even if they have low CPU usage, this causes some jitters on the response time to some of our business containers when enabling CPU quotas. It's very simple to reproduce: mkdir /sys/fs/cgroup/cpu/test cd /sys/fs/cgroup/cpu/test echo 100000 > cpu.cfs_quota_us echo $$ > tasks then repeat: cat cpu.stat | grep nr_throttled # nr_throttled will increase steadily After some analysis, we found that cfs_rq::runtime_remaining will be cleared by expire_cfs_rq_runtime() due to two equal but stale "cfs_{b|q}->runtime_expires" after period timer is re-armed. The current condition to judge clock drift in expire_cfs_rq_runtime() is wrong, the two runtime_expires are actually the same when clock drift happens, so this condtion can never hit. The orginal design was correctly done by this commit: a9cf55b2 ("sched: Expire invalid runtime") ... but was changed to be the current implementation due to its locking bug. This patch introduces another way, it adds a new field in both structures cfs_rq and cfs_bandwidth to record the expiration update sequence, and uses them to figure out if clock drift happens (true if they are equal). Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NBen Segall <bsegall@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 51f2176d ("sched/fair: Fix unlocked reads of some cfs_b->quota/period") Link: http://lkml.kernel.org/r/20180620101834.24455-1-xlpang@linux.alibaba.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Vincent Guittot 提交于
With commit: 8f111bc3 ("cpufreq/schedutil: Rewrite CPUFREQ_RT support") the schedutil governor uses rq->rt.rt_nr_running to detect whether an RT task is currently running on the CPU and to set frequency to max if necessary. cpufreq_update_util() is called in enqueue/dequeue_top_rt_rq() but rq->rt.rt_nr_running has not been updated yet when dequeue_top_rt_rq() is called so schedutil still considers that an RT task is running when the last task is dequeued. The update of rq->rt.rt_nr_running happens later in dequeue_rt_stack(). In fact, we can take advantage of the sequence that the dequeue then re-enqueue rt entities when a rt task is enqueued or dequeued; As a result enqueue_top_rt_rq() is always called when a task is enqueued or dequeued and also when groups are throttled or unthrottled. The only place that not use enqueue_top_rt_rq() is when root rt_rq is throttled. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: efault@gmx.de Cc: juri.lelli@redhat.com Cc: patrick.bellasi@arm.com Cc: viresh.kumar@linaro.org Fixes: 8f111bc3 ('cpufreq/schedutil: Rewrite CPUFREQ_RT support') Link: http://lkml.kernel.org/r/1530021202-21695-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Frederic Weisbecker 提交于
Some people have reported that the warning in sched_tick_remote() occasionally triggers, especially in favour of some RCU-Torture pressure: WARNING: CPU: 11 PID: 906 at kernel/sched/core.c:3138 sched_tick_remote+0xb6/0xc0 Modules linked in: CPU: 11 PID: 906 Comm: kworker/u32:3 Not tainted 4.18.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Workqueue: events_unbound sched_tick_remote RIP: 0010:sched_tick_remote+0xb6/0xc0 Code: e8 0f 06 b8 00 c6 03 00 fb eb 9d 8b 43 04 85 c0 75 8d 48 8b 83 e0 0a 00 00 48 85 c0 75 81 eb 88 48 89 df e8 bc fe ff ff eb aa <0f> 0b eb +c5 66 0f 1f 44 00 00 bf 17 00 00 00 e8 b6 2e fe ff 0f b6 Call Trace: process_one_work+0x1df/0x3b0 worker_thread+0x44/0x3d0 kthread+0xf3/0x130 ? set_worker_desc+0xb0/0xb0 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x35/0x40 This happens when the remote tick applies on an idle task. Usually the idle_cpu() check avoids that, but it is performed before we lock the runqueue and it is therefore racy. It was intended to be that way in order to prevent from useless runqueue locks since idle task tick callback is a no-op. Now if the racy check slips out of our hands and we end up remotely ticking an idle task, the empty task_tick_idle() is harmless. Still it won't pass the WARN_ON_ONCE() test that ensures rq_clock_task() is not too far from curr->se.exec_start because update_curr_idle() doesn't update the exec_start value like other scheduler policies. Hence the reported false positive. So let's have another check, while the rq is locked, to make sure we don't remote tick on an idle task. The lockless idle_cpu() still applies to avoid unecessary rq lock contention. Reported-by: NJacek Tomaka <jacekt@dug.com> Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Reported-by: NAnna-Maria Gleixner <anna-maria@linutronix.de> Signed-off-by: NFrederic Weisbecker <frederic@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1530203381-31234-1-git-send-email-frederic@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 01 7月, 2018 4 次提交
-
-
由 John Fastabend 提交于
Add map_release_uref pointer to hashmap ops. This was dropped when original sockhash code was ported into bpf-next before initial commit. Fixes: 81110384 ("bpf: sockmap, add hash map support") Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
First the sk_callback_lock() was being used to protect both the sock callback hooks and the psock->maps list. This got overly convoluted after the addition of sockhash (in sockmap it made some sense because masp and callbacks were tightly coupled) so lets split out a specific lock for maps and only use the callback lock for its intended purpose. This fixes a couple cases where we missed using maps lock when it was in fact needed. Also this makes it easier to follow the code because now we can put the locking closer to the actual code its serializing. Next, in sock_hash_delete_elem() the pattern was as follows, sock_hash_delete_elem() [...] spin_lock(bucket_lock) l = lookup_elem_raw() if (l) hlist_del_rcu() write_lock(sk_callback_lock) .... destroy psock ... write_unlock(sk_callback_lock) spin_unlock(bucket_lock) The ordering is necessary because we only know the {p}sock after dereferencing the hash table which we can't do unless we have the bucket lock held. Once we have the bucket lock and the psock element it is deleted from the hashmap to ensure any other path doing a lookup will fail. Finally, the refcnt is decremented and if zero the psock is destroyed. In parallel with the above (or free'ing the map) a tcp close event may trigger tcp_close(). Which at the moment omits the bucket lock altogether (oops!) where the flow looks like this, bpf_tcp_close() [...] write_lock(sk_callback_lock) for each psock->maps // list of maps this sock is part of hlist_del_rcu(ref_hash_node); .... destroy psock ... write_unlock(sk_callback_lock) Obviously, and demonstrated by syzbot, this is broken because we can have multiple threads deleting entries via hlist_del_rcu(). To fix this we might be tempted to wrap the hlist operation in a bucket lock but that would create a lock inversion problem. In summary to follow locking rules the psocks maps list needs the sk_callback_lock (after this patch maps_lock) but we need the bucket lock to do the hlist_del_rcu. To resolve the lock inversion problem pop the head of the maps list repeatedly and remove the reference until no more are left. If a delete happens in parallel from the BPF API that is OK as well because it will do a similar action, lookup the lock in the map/hash, delete it from the map/hash, and dec the refcnt. We check for this case before doing a destroy on the psock to ensure we don't have two threads tearing down a psock. The new logic is as follows, bpf_tcp_close() e = psock_map_pop(psock->maps) // done with map lock bucket_lock() // lock hash list bucket l = lookup_elem_raw(head, hash, key, key_size); if (l) { //only get here if elmnt was not already removed hlist_del_rcu() ... destroy psock... } bucket_unlock() And finally for all the above to work add missing locking around map operations per above. Then add RCU annotations and use rcu_dereference/rcu_assign_pointer to manage values relying on RCU so that the object is not free'd from sock_hash_free() while it is being referenced in bpf_tcp_close(). Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384 ("bpf: sockmap, add hash map support") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
If a hashmap is free'd with open socks it removes the reference to the hash entry from the psock. If that is the last reference to the psock then it will also be free'd by the reference counting logic. However the current logic that removes the hash reference from the list of references is broken. In smap_list_remove() we first check if the sockmap entry matches and then check if the hashmap entry matches. But, the sockmap entry sill always match because its NULL in this case which causes the first entry to be removed from the list. If this is always the "right" entry (because the user adds/removes entries in order) then everything is OK but otherwise a subsequent bpf_tcp_close() may reference a free'd object. To fix this create two list handlers one for sockmap and one for sockhash. Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384 ("bpf: sockmap, add hash map support") Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
This fixes a crash where we assign tcp_prot to IPv6 sockets instead of tcpv6_prot. Previously we overwrote the sk->prot field with tcp_prot even in the AF_INET6 case. This patch ensures the correct tcp_prot and tcpv6_prot are used. Tested with 'netserver -6' and 'netperf -H [IPv6]' as well as 'netperf -H [IPv4]'. The ESTABLISHED check resolves the previously crashing case here. Fixes: 174a79ff ("bpf: sockmap with sk redirect support") Reported-by: syzbot+5c063698bdbfac19f363@syzkaller.appspotmail.com Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NWei Wang <weiwan@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 30 6月, 2018 1 次提交
-
-
由 Daniel Borkmann 提交于
Partially undo commit 9facc336 ("bpf: reject any prog that failed read-only lock") since it caused a regression, that is, syzkaller was able to manage to cause a panic via fault injection deep in set_memory_ro() path by letting an allocation fail: In x86's __change_page_attr_set_clr() it was able to change the attributes of the primary mapping but not in the alias mapping via cpa_process_alias(), so the second, inner call to the __change_page_attr() via __change_page_attr_set_clr() had to split a larger page and failed in the alloc_pages() with the artifically triggered allocation error which is then propagated down to the call site. Thus, for set_memory_ro() this means that it returned with an error, but from debugging a probe_kernel_write() revealed EFAULT on that memory since the primary mapping succeeded to get changed. Therefore the subsequent hdr->locked = 0 reset triggered the panic as it was performed on read-only memory, so call-site assumptions were infact wrong to assume that it would either succeed /or/ not succeed at all since there's no such rollback in set_memory_*() calls from partial change of mappings, in other words, we're left in a state that is "half done". A later undo via set_memory_rw() is succeeding though due to matching permissions on that part (aka due to the try_preserve_large_page() succeeding). While reproducing locally with explicitly triggering this error, the initial splitting only happens on rare occasions and in real world it would additionally need oom conditions, but that said, it could partially fail. Therefore, it is definitely wrong to bail out on set_memory_ro() error and reject the program with the set_memory_*() semantics we have today. Shouldn't have gone the extra mile since no other user in tree today infact checks for any set_memory_*() errors, e.g. neither module_enable_ro() / module_disable_ro() for module RO/NX handling which is mostly default these days nor kprobes core with alloc_insn_page() / free_insn_page() as examples that could be invoked long after bootup and original 314beb9b ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") did neither when it got first introduced to BPF so "improving" with bailing out was clearly not right when set_memory_*() cannot handle it today. Kees suggested that if set_memory_*() can fail, we should annotate it with __must_check, and all callers need to deal with it gracefully given those set_memory_*() markings aren't "advisory", but they're expected to actually do what they say. This might be an option worth to move forward in future but would at the same time require that set_memory_*() calls from supporting archs are guaranteed to be "atomic" in that they provide rollback if part of the range fails, once that happened, the transition from RW -> RO could be made more robust that way, while subsequent RO -> RW transition /must/ continue guaranteeing to always succeed the undo part. Reported-by: syzbot+a4eb8c7766952a1ca872@syzkaller.appspotmail.com Reported-by: syzbot+d866d1925855328eac3b@syzkaller.appspotmail.com Fixes: 9facc336 ("bpf: reject any prog that failed read-only lock") Cc: Laura Abbott <labbott@redhat.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 28 6月, 2018 1 次提交
-
-
由 Christoph Hellwig 提交于
For architectures that do not use per-device dma ops we need to export the dma_map_ops structure returned from get_arch_dma_ops(). Fixes: 10314e09 ("riscv: add swiotlb support") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NAndreas Schwab <schwab@suse.de>
-
- 27 6月, 2018 1 次提交
-
-
由 Mathieu Malaterre 提交于
Fix non-fatal warning triggered during compilation with W=1: kernel/events/core.c:6106:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] static void __always_inline ^~~~~~ Signed-off-by: NMathieu Malaterre <malat@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180626202301.20270-1-malat@debian.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 26 6月, 2018 1 次提交
-
-
由 Sean Young 提交于
If the kernel is compiled with CONFIG_CGROUP_BPF not enabled, it is not possible to attach, detach or query IR BPF programs to /dev/lircN devices, making them impossible to use. For embedded devices, it should be possible to use IR decoding without cgroups or CONFIG_CGROUP_BPF enabled. This change requires some refactoring, since bpf_prog_{attach,detach,query} functions are now always compiled, but their code paths for cgroups need moving out. Rather than a #ifdef CONFIG_CGROUP_BPF in kernel/bpf/syscall.c, moving them to kernel/bpf/cgroup.c and kernel/bpf/sockmap.c does not require #ifdefs since that is already conditionally compiled. Fixes: f4364dcf ("media: rc: introduce BPF_PROG_LIRC_MODE2") Signed-off-by: NSean Young <sean@mess.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 23 6月, 2018 1 次提交
-
-
由 Will Deacon 提交于
When delivering a signal to a task that is using rseq, we call into __rseq_handle_notify_resume() so that the registers pushed in the sigframe are updated to reflect the state of the restartable sequence (for example, ensuring that the signal returns to the abort handler if necessary). However, if the rseq management fails due to an unrecoverable fault when accessing userspace or certain combinations of RSEQ_CS_* flags, then we will attempt to deliver a SIGSEGV. This has the potential for infinite recursion if the rseq code continuously fails on signal delivery. Avoid this problem by using force_sigsegv() instead of force_sig(), which is explicitly designed to reset the SEGV handler to SIG_DFL in the case of a recursive fault. In doing so, remove rseq_signal_deliver() from the internal rseq API and have an optional struct ksignal * parameter to rseq_handle_notify_resume() instead. Signed-off-by: NWill Deacon <will.deacon@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: peterz@infradead.org Cc: paulmck@linux.vnet.ibm.com Cc: boqun.feng@gmail.com Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com
-
- 22 6月, 2018 7 次提交
-
-
由 Geert Uytterhoeven 提交于
For the common cases where 1000 is a multiple of HZ, or HZ is a multiple of 1000, jiffies_to_msecs() never returns zero when passed a non-zero time period. However, if HZ > 1000 and not an integer multiple of 1000 (e.g. 1024 or 1200, as used on alpha and DECstation), jiffies_to_msecs() may return zero for small non-zero time periods. This may break code that relies on receiving back a non-zero value. jiffies_to_usecs() does not need such a fix: one jiffy can only be less than one µs if HZ > 1000000, and such large values of HZ are already rejected at build time, twice: - include/linux/jiffies.h does #error if HZ >= 12288, - kernel/time/time.c has BUILD_BUG_ON(HZ > USEC_PER_SEC). Broken since forever. Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NArnd Bergmann <arnd@arndb.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: linux-alpha@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20180622143357.7495-1-geert@linux-m68k.org
-
由 Marc Zyngier 提交于
Debug is missing the IRQCHIP_SUPPORTS_LEVEL_MSI debug entry, making debugfs slightly less useful. Take this opportunity to also add a missing comment in the definition of IRQCHIP_SUPPORTS_LEVEL_MSI. Fixes: 6988e0e0 ("genirq/msi: Limit level-triggered MSI to platform devices") Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Alexandre Belloni <alexandre.belloni@bootlin.com> Cc: Yang Yingliang <yangyingliang@huawei.com> Cc: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20180622095254.5906-2-marc.zyngier@arm.com
-
由 Mathieu Malaterre 提交于
When building perf with W=1 the following warning triggers: CC kernel/events/ring_buffer.o kernel/events/ring_buffer.c:105:1: warning: ‘inline’ is not at beginning of declaration [-Wold-style-declaration] static bool __always_inline ^~~~~~ ... Move the inline keyword to the beginning of the function declaration. Signed-off-by: NMathieu Malaterre <malat@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: trival@kernel.org Link: http://lkml.kernel.org/r/20180308202856.9378-1-malat@debian.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Gustavo A. R. Silva 提交于
Make use of the swap macro and remove unnecessary variable _buf_. This makes the code easier to read and maintain. Also, reduces the stack usage. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20180209175316.GA18720@embeddedgusSigned-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Joel Fernandes (Google) 提交于
I'm able to reproduce a lockdep splat with config options: CONFIG_PROVE_LOCKING=y, CONFIG_DEBUG_LOCK_ALLOC=y and CONFIG_PREEMPTIRQ_EVENTS=y $ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable [ 26.112609] DEBUG_LOCKS_WARN_ON(current->softirqs_enabled) [ 26.112636] WARNING: CPU: 0 PID: 118 at kernel/locking/lockdep.c:3854 [...] [ 26.144229] Call Trace: [ 26.144926] <IRQ> [ 26.145506] lock_acquire+0x55/0x1b0 [ 26.146499] ? __do_softirq+0x46f/0x4d9 [ 26.147571] ? __do_softirq+0x46f/0x4d9 [ 26.148646] trace_preempt_on+0x8f/0x240 [ 26.149744] ? trace_preempt_on+0x4d/0x240 [ 26.150862] ? __do_softirq+0x46f/0x4d9 [ 26.151930] preempt_count_sub+0x18a/0x1a0 [ 26.152985] __do_softirq+0x46f/0x4d9 [ 26.153937] irq_exit+0x68/0xe0 [ 26.154755] smp_apic_timer_interrupt+0x271/0x280 [ 26.156056] apic_timer_interrupt+0xf/0x20 [ 26.157105] </IRQ> The issue was this: preempt_count = 1 << SOFTIRQ_SHIFT __local_bh_enable(cnt = 1 << SOFTIRQ_SHIFT) { if (softirq_count() == (cnt && SOFTIRQ_MASK)) { trace_softirqs_on() { current->softirqs_enabled = 1; } } preempt_count_sub(cnt) { trace_preempt_on() { tracepoint() { rcu_read_lock_sched() { // jumps into lockdep Where preempt_count still has softirqs disabled, but current->softirqs_enabled is true, and we get a splat. Link: http://lkml.kernel.org/r/20180607201143.247775-1-joel@joelfernandes.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Thomas Glexiner <tglx@linutronix.de> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Todd Kjos <tkjos@google.com> Cc: Erick Reyes <erickreyes@google.com> Cc: Julia Cartwright <julia@ni.com> Cc: Byungchul Park <byungchul.park@lge.com> Cc: stable@vger.kernel.org Reviewed-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Fixes: d5915816 ("tracing: Add support for preempt and irq enable/disable events") Signed-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The syzkaller detected a out-of-bounds issue with the events filter code, specifically here: prog[N].pred = NULL; /* #13 */ prog[N].target = 1; /* TRUE */ prog[N+1].pred = NULL; prog[N+1].target = 0; /* FALSE */ -> prog[N-1].target = N; prog[N-1].when_to_branch = false; As that's the first reference to a "N-1" index, it appears that the code got here with N = 0, which means the filter parser found no filter to parse (which shouldn't ever happen, but apparently it did). Add a new error to the parsing code that will check to make sure that N is not zero before going into this part of the code. If N = 0, then -EINVAL is returned, and a error message is added to the filter. Cc: stable@vger.kernel.org Fixes: 80765597 ("tracing: Rewrite filter logic to be simpler and faster") Reported-by: Nair icy <icytxw@gmail.com> bugzilla url: https://bugzilla.kernel.org/show_bug.cgi?id=200019Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
While debugging where things were going wrong with mapping enabling/disabling interrupts with the lockdep state and actual real enabling and disabling interrupts, I had to silent the IRQ disabling/enabling in debug_check_no_locks_freed() because it was always showing up as it was called before the splat was. Use raw_local_irq_save/restore() for not only debug_check_no_locks_freed() but for all internal lockdep functions, as they hide useful information about where interrupts were used incorrectly last. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/lkml/20180404140630.3f4f4c7a@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 21 6月, 2018 1 次提交
-
-
由 Mark Rutland 提交于
While __atomic_add_unless() was originally intended as a building-block for atomic_add_unless(), it's now used in a number of places around the kernel. It's the only common atomic operation named __atomic*(), rather than atomic_*(), and for consistency it would be better named atomic_fetch_add_unless(). This lack of consistency is slightly confusing, and gets in the way of scripting atomics. Given that, let's clean things up and promote it to an official part of the atomics API, in the form of atomic_fetch_add_unless(). This patch converts definitions and invocations over to the new name, including the instrumented version, using the following script: ---- git grep -w __atomic_add_unless | while read line; do sed -i '{s/\<__atomic_add_unless\>/atomic_fetch_add_unless/}' "${line%%:*}"; done git grep -w __arch_atomic_add_unless | while read line; do sed -i '{s/\<__arch_atomic_add_unless\>/arch_atomic_fetch_add_unless/}' "${line%%:*}"; done ---- Note that we do not have atomic{64,_long}_fetch_add_unless(), which will be introduced by later patches. There should be no functional change as a result of this patch. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NWill Deacon <will.deacon@arm.com> Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPalmer Dabbelt <palmer@sifive.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/20180621121321.4761-2-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 20 6月, 2018 1 次提交
-
-
由 Waiman Long 提交于
It was found that the use of up_read_non_owner() in NFS was causing the following warning when DEBUG_RWSEMS was configured. DEBUG_LOCKS_WARN_ON(sem->owner != ((struct task_struct *)(1UL << 0))) Looking into the rwsem.c file, it was discovered that the corresponding down_read_non_owner() function was not setting the owner field properly. This is fixed now, and the warning should be gone. Fixes: 5149cbac ("locking/rwsem: Add DEBUG_RWSEMS to look for lock/unlock mismatches") Signed-off-by: NWaiman Long <longman@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NGavin Schenk <g.schenk@eckelmann.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-nfs@vger.kernel.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1527168398-4291-1-git-send-email-longman@redhat.com
-
- 19 6月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
Commit b5793b0d added support for building the nanosleep compat system call on 32-bit architectures, but missed one change in nanosleep_copyout(), which would trigger a BUG() as soon as any architecture is switched over to use it. Use the proper config symbol to enable the code path. Fixes: Commit b5793b0d ("posix-timers: Make compat syscalls depend on CONFIG_COMPAT_32BIT_TIME") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: y2038@lists.linaro.org Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Link: https://lkml.kernel.org/r/20180618140811.2998503-1-arnd@arndb.de
-