- 10 4月, 2018 1 次提交
-
-
由 Prashant Bhole 提交于
A use-after-free bug was caught by KASAN while running usdt related code (BCC project. bcc/tests/python/test_usdt2.py): ================================================================== BUG: KASAN: use-after-free in uprobe_perf_close+0x222/0x3b0 Read of size 4 at addr ffff880384f9b4a4 by task test_usdt2.py/870 CPU: 4 PID: 870 Comm: test_usdt2.py Tainted: G W 4.16.0-next-20180409 #215 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 Call Trace: dump_stack+0xc7/0x15b ? show_regs_print_info+0x5/0x5 ? printk+0x9c/0xc3 ? kmsg_dump_rewind_nolock+0x6e/0x6e ? uprobe_perf_close+0x222/0x3b0 print_address_description+0x83/0x3a0 ? uprobe_perf_close+0x222/0x3b0 kasan_report+0x1dd/0x460 ? uprobe_perf_close+0x222/0x3b0 uprobe_perf_close+0x222/0x3b0 ? probes_open+0x180/0x180 ? free_filters_list+0x290/0x290 trace_uprobe_register+0x1bb/0x500 ? perf_event_attach_bpf_prog+0x310/0x310 ? probe_event_disable+0x4e0/0x4e0 perf_uprobe_destroy+0x63/0xd0 _free_event+0x2bc/0xbd0 ? lockdep_rcu_suspicious+0x100/0x100 ? ring_buffer_attach+0x550/0x550 ? kvm_sched_clock_read+0x1a/0x30 ? perf_event_release_kernel+0x3e4/0xc00 ? __mutex_unlock_slowpath+0x12e/0x540 ? wait_for_completion+0x430/0x430 ? lock_downgrade+0x3c0/0x3c0 ? lock_release+0x980/0x980 ? do_raw_spin_trylock+0x118/0x150 ? do_raw_spin_unlock+0x121/0x210 ? do_raw_spin_trylock+0x150/0x150 perf_event_release_kernel+0x5d4/0xc00 ? put_event+0x30/0x30 ? fsnotify+0xd2d/0xea0 ? sched_clock_cpu+0x18/0x1a0 ? __fsnotify_update_child_dentry_flags.part.0+0x1b0/0x1b0 ? pvclock_clocksource_read+0x152/0x2b0 ? pvclock_read_flags+0x80/0x80 ? kvm_sched_clock_read+0x1a/0x30 ? sched_clock_cpu+0x18/0x1a0 ? pvclock_clocksource_read+0x152/0x2b0 ? locks_remove_file+0xec/0x470 ? pvclock_read_flags+0x80/0x80 ? fcntl_setlk+0x880/0x880 ? ima_file_free+0x8d/0x390 ? lockdep_rcu_suspicious+0x100/0x100 ? ima_file_check+0x110/0x110 ? fsnotify+0xea0/0xea0 ? kvm_sched_clock_read+0x1a/0x30 ? rcu_note_context_switch+0x600/0x600 perf_release+0x21/0x40 __fput+0x264/0x620 ? fput+0xf0/0xf0 ? do_raw_spin_unlock+0x121/0x210 ? do_raw_spin_trylock+0x150/0x150 ? SyS_fchdir+0x100/0x100 ? fsnotify+0xea0/0xea0 task_work_run+0x14b/0x1e0 ? task_work_cancel+0x1c0/0x1c0 ? copy_fd_bitmaps+0x150/0x150 ? vfs_read+0xe5/0x260 exit_to_usermode_loop+0x17b/0x1b0 ? trace_event_raw_event_sys_exit+0x1a0/0x1a0 do_syscall_64+0x3f6/0x490 ? syscall_return_slowpath+0x2c0/0x2c0 ? lockdep_sys_exit+0x1f/0xaa ? syscall_return_slowpath+0x1a3/0x2c0 ? lockdep_sys_exit+0x1f/0xaa ? prepare_exit_to_usermode+0x11c/0x1e0 ? enter_from_user_mode+0x30/0x30 random: crng init done ? __put_user_4+0x1c/0x30 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 RIP: 0033:0x7f41d95f9340 RSP: 002b:00007fffe71e4268 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 000000000000000d RCX: 00007f41d95f9340 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 000000000000000d RBP: 0000000000000000 R08: 00007f41ca8ff700 R09: 00007f41d996dd1f R10: 00007fffe71e41e0 R11: 0000000000000246 R12: 00007fffe71e4330 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007fffe71e4290 Allocated by task 870: kasan_kmalloc+0xa0/0xd0 kmem_cache_alloc_node+0x11a/0x430 copy_process.part.19+0x11a0/0x41c0 _do_fork+0x1be/0xa20 do_syscall_64+0x198/0x490 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 Freed by task 0: __kasan_slab_free+0x12e/0x180 kmem_cache_free+0x102/0x4d0 free_task+0xfe/0x160 __put_task_struct+0x189/0x290 delayed_put_task_struct+0x119/0x250 rcu_process_callbacks+0xa6c/0x1b60 __do_softirq+0x238/0x7ae The buggy address belongs to the object at ffff880384f9b480 which belongs to the cache task_struct of size 12928 It occurs because task_struct is freed before perf_event which refers to the task and task flags are checked while teardown of the event. perf_event_alloc() assigns task_struct to hw.target of perf_event, but there is no reference counting for it. As a fix we get_task_struct() in perf_event_alloc() at above mentioned assignment and put_task_struct() in _free_event(). Signed-off-by: NPrashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 63b6da39 ("perf: Fix perf_event_exit_task() race") Link: http://lkml.kernel.org/r/20180409100346.6416-1-bhole_prashant_q7@lab.ntt.co.jpSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 03 4月, 2018 17 次提交
-
-
由 Dominik Brodowski 提交于
This keeps it in line with the SYSCALL_DEFINEx() / COMPAT_SYSCALL_DEFINEx() calling convention. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Shuffle the cond_syscall() entries in kernel/sys_ni.c around so that they are kept in the same order as in include/uapi/asm-generic/unistd.h. For better structuring, add the same comments as in that file, but keep a few additional comments and extend the commentary where it seems useful. Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel call to the sys_setsid() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_setsid(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel calls to the sys_unshare() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_unshare(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel calls to the sys_sync() syscall. The ksys_ prefix denotes that this function is meant as a drop-in replacement for the syscall. In particular, it uses the same calling convention as sys_sync(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using the fs-interal do_fchownat() wrapper allows us to get rid of fs-internal calls to the sys_fchownat() syscall. Introducing the ksys_fchown() helper and the ksys_{,}chown() wrappers allows us to avoid the in-kernel calls to the sys_{,l,f}chown() syscalls. The ksys_ prefix denotes that these functions are meant as a drop-in replacement for the syscalls. In particular, they use the same calling convention as sys_{,l,f}chown(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
While sys32_quotactl() is only needed on x86, it can use the recommended COMPAT_SYSCALL_DEFINEx() machinery for its setup. Acked-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Move compat_sys_move_pages() to mm/migrate.c and make it call a newly introduced helper -- kernel_move_pages() -- instead of the syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Move compat_sys_migrate_pages() to mm/mempolicy.c and make it call a newly introduced helper -- kernel_migrate_pages() -- instead of the syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using the sched-internal do_sched_yield() helper allows us to get rid of the sched-internal call to the sys_sched_yield() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Ingo Molnar <mingo@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using these helpers allows us to avoid the in-kernel calls to these syscalls: sys_setregid(), sys_setgid(), sys_setreuid(), sys_setuid(), sys_setresuid(), sys_setresgid(), sys_setfsuid(), and sys_setfsgid(). The ksys_ prefix denotes that these function are meant as a drop-in replacement for the syscall. In particular, they use the same calling convention. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using this helper allows us to avoid the in-kernel call to the compat_sys_sigaltstack() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
Using the do_getpgid() helper removes an in-kernel call to the sys_getpgid() syscall. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
sys_futex() is a wrapper to do_futex() which does not modify any values here: - uaddr, val and val3 are kept the same - op is masked with FUTEX_CMD_MASK, but is always set to FUTEX_WAKE. Therefore, val2 is always 0. - as utime is set to NULL, *timeout is NULL This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Darren Hart <dvhart@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
do_kexec_load() can be called directly by compat_sys_kexec() as long as the same parameters checks are completed which are currently handled (also) by sys_kexec(). Therefore, move those to kexec_load_check(), call that newly introduced helper function from both sys_kexec() and compat_sys_kexec(), and duplicate the remaining code from sys_kexec() in compat_sys_kexec(). This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Eric Biederman <ebiederm@xmission.com> Cc: kexec@lists.infradead.org Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
A similar but not fully equivalent code path is already open-coded three times (in sys_rt_sigpending and in the two compat stubs), so do it a fourth time here. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.net Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
由 Dominik Brodowski 提交于
All call sites of sys_wait4() set *rusage to NULL. Therefore, there is no need for the copy_to_user() handling of *rusage, and we can use kernel_wait4() directly. This patch is part of a series which removes in-kernel calls to syscalls. On this basis, the syscall entry path can be streamlined. For details, see http://lkml.kernel.org/r/20180325162527.GA17492@light.dominikbrodowski.netAcked-by: NLuis R. Rodriguez <mcgrof@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
-
- 31 3月, 2018 1 次提交
-
-
由 Waiman Long 提交于
For a rwsem, locking can either be exclusive or shared. The corresponding exclusive or shared unlock must be used. Otherwise, the protected data structures may get corrupted or the lock may be in an inconsistent state. In order to detect such anomaly, a new configuration option DEBUG_RWSEMS is added which can be enabled to look for such mismatches and print warnings that that happens. Signed-off-by: NWaiman Long <longman@redhat.com> Acked-by: NDavidlohr Bueso <dave@stgolabs.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1522445280-7767-2-git-send-email-longman@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 29 3月, 2018 3 次提交
-
-
由 Alexander Shishkin 提交于
This is a cosmetic patch that deals with the address filter structure's ambiguous fields 'filter' and 'range'. The former stands to mean that the filter's *action* should be to filter the traces to its address range if it's set or stop tracing if it's unset. This is confusing and hard on the eyes, so this patch replaces it with 'action' enum. The 'range' field is completely redundant (meaning that the filter is an address range as opposed to a single address trigger), as we can use zero size to mean the same thing. Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20180329120648.11902-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tetsuo Handa 提交于
The lock debug output in print_lock() has a few shortcomings: - It prints the hlock->acquire_ip field in %px and %pS format. That's redundant information. - It lacks information about the lock object itself. The lock class is not helpful to identify a particular instance of a lock. Change the output so it prints: - hlock->instance to allow identification of a particular lock instance. - only the %pS format of hlock->ip_acquire which is sufficient to decode the actual code line with faddr2line. The resulting output is: 3 locks held by a.out/31106: #0: 00000000b0f753ba (&mm->mmap_sem){++++}, at: copy_process.part.41+0x10d5/0x1fe0 #1: 00000000ef64d539 (&mm->mmap_sem/1){+.+.}, at: copy_process.part.41+0x10fe/0x1fe0 #2: 00000000b41a282e (&mapping->i_mmap_rwsem){++++}, at: copy_process.part.41+0x12f2/0x1fe0 [ tglx: Massaged changelog ] Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: linux-mm@kvack.org Cc: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/201803271941.GBE57310.tVSOJLQOFFOHFM@I-love.SAKURA.ne.jp
-
由 Peter Zijlstra 提交于
In -RT task_blocks_on_rt_mutex() may return with -EAGAIN due to (->pi_blocked_on == PI_WAKEUP_INPROGRESS) before it added itself as a waiter. In such a case remove_waiter() must not be called because without a waiter it will trigger the BUG_ON() statement. This was initially reported by Yimin Deng. Thomas Gleixner fixed it then with an explicit check for waiters before calling remove_waiter(). Instead of an explicit NULL check before calling rt_mutex_top_waiter() make the function return NULL if there are no waiters. With that fixed the now pointless NULL check is removed from rt_mutex_slowlock(). Reported-and-debugged-by: NYimin Deng <yimin11.deng@gmail.com> Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/CAAh1qt=DCL9aUXNxanP5BKtiPp3m+qj4yB+gDohhXPVFCxWwzg@mail.gmail.com Link: https://lkml.kernel.org/r/20180327121438.sss7hxg3crqy4ecd@linutronix.de
-
- 28 3月, 2018 1 次提交
-
-
由 Linus Torvalds 提交于
Annoyingly, modify_user_hw_breakpoint() unnecessarily complicates the modification of a breakpoint - simplify it and remove the pointless local variables. Also update the stale Docbook while at it. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Acked-by: NThomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 27 3月, 2018 1 次提交
-
-
由 Davidlohr Bueso 提交于
No changes in refcount semantics, use DEFINE_STATIC_KEY_FALSE() for initialization and replace: static_key_slow_inc|dec() => static_branch_inc|dec() static_key_false() => static_branch_unlikely() Signed-off-by: NDavidlohr Bueso <dbueso@suse.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akpm@linux-foundation.org Link: http://lkml.kernel.org/r/20180326210929.5244-4-dave@stgolabs.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 3月, 2018 2 次提交
-
-
由 Claudio Scordino 提交于
When the SCHED_DEADLINE scheduling class increases the CPU utilization, it should not wait for the rate limit, otherwise it may miss some deadline. Tests using rt-app on Exynos5422 with up to 10 SCHED_DEADLINE tasks have shown reductions of even 10% of deadline misses with a negligible increase of energy consumption (measured through Baylibre Cape). Signed-off-by: NClaudio Scordino <claudio@evidence.eu.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: linux-pm@vger.kernel.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Todd Kjos <tkjos@android.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/1520937340-2755-1-git-send-email-claudio@evidence.eu.com
-
由 Masami Hiramatsu 提交于
In Documentation/trace/kprobetrace.txt, it says @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol) However, the parser doesn't parse minus offset correctly, since commit 2fba0c88 ("tracing/kprobes: Fix probe offset to be unsigned") drops minus ("-") offset support for kprobe probe address usage. This fixes the traceprobe_split_symbol_offset() to parse minus offset again with checking the offset range, and add a minus offset check in kprobe probe address usage. Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox Cc: Ingo Molnar <mingo@redhat.com> Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 2fba0c88 ("tracing/kprobes: Fix probe offset to be unsigned") Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 22 3月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
The clockid argument of clockid_to_kclock() comes straight from user space via various syscalls and is used as index into the posix_clocks array. Protect it against spectre v1 array out of bounds speculation. Remove the redundant check for !posix_clock[id] as this is another source for speculation and does not provide any advantage over the return posix_clock[id] path which returns NULL in that case anyway. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Cc: Greg KH <gregkh@linuxfoundation.org> Cc: stable@vger.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1802151718320.1296@nanos.tec.linutronix.de
-
- 21 3月, 2018 2 次提交
-
-
由 Chenbo Feng 提交于
The current check statement in BPF syscall will do a capability check for CAP_SYS_ADMIN before checking sysctl_unprivileged_bpf_disabled. This code path will trigger unnecessary security hooks on capability checking and cause false alarms on unprivileged process trying to get CAP_SYS_ADMIN access. This can be resolved by simply switch the order of the statement and CAP_SYS_ADMIN is not required anyway if unprivileged bpf syscall is allowed. Signed-off-by: NChenbo Feng <fengc@google.com> Acked-by: NLorenzo Colitti <lorenzo@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Yonghong Song 提交于
Commit 4bebdc7a ("bpf: add helper bpf_perf_prog_read_value") added helper bpf_perf_prog_read_value so that perf_event type program can read event counter and enabled/running time. This commit, however, introduced a bug which allows this helper for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value needs to access perf_event through its bpf_perf_event_data_kern type context, which is not available for tracepoint type program. This patch fixed the issue by separating bpf_func_proto between tracepoint and perf_event type programs and removed bpf_perf_prog_read_value from tracepoint func prototype. Fixes: 4bebdc7a ("bpf: add helper bpf_perf_prog_read_value") Reported-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 20 3月, 2018 11 次提交
-
-
由 Joe Lawrence 提交于
Scheduler debug stats include newlines that display out of alignment when prefixed by timestamps. For example, the dmesg utility: % echo t > /proc/sysrq-trigger % dmesg ... [ 83.124251] runnable tasks: S task PID tree-key switches prio wait-time sum-exec sum-sleep ----------------------------------------------------------------------------------------------------------- At the same time, some syslog utilities (like rsyslog by default) don't like the additional newlines control characters, saving lines like this to /var/log/messages: Mar 16 16:02:29 localhost kernel: #012runnable tasks:#12 S task PID tree-key ... ^^^^ ^^^^ Clean these up by moving newline characters to their own SEQ_printf invocation. This leaves the /proc/sched_debug unchanged, but brings the entire output into alignment when prefixed: % echo t > /proc/sysrq-trigger % dmesg ... [ 62.410368] runnable tasks: [ 62.410368] S task PID tree-key switches prio wait-time sum-exec sum-sleep [ 62.410369] ----------------------------------------------------------------------------------------------------------- [ 62.410369] I kworker/u12:0 5 1932.215593 332 120 0.000000 3.621252 0.000000 0 0 / and no escaped control characters from rsyslog in /var/log/messages: Mar 16 16:15:06 localhost kernel: runnable tasks: Mar 16 16:15:06 localhost kernel: S task PID tree-key ... Signed-off-by: NJoe Lawrence <joe.lawrence@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-3-git-send-email-joe.lawrence@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Joe Lawrence 提交于
When the SEQ_printf() macro prints to the console, it runs a simple printk() without KERN_CONT "continued" line printing. The result of this is oddly wrapped task info, for example: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 29.608611] I [ 29.608613] rcu_sched 8 3252.013846 4087 120 [ 29.608614] 0.000000 29.090111 0.000000 [ 29.608615] 0 0 [ 29.608616] / Modify SEQ_printf to use pr_cont() for expected one-line results: % echo t > /proc/sysrq-trigger % dmesg ... runnable tasks: ... [ 106.716329] S cpuhp/5 37 2006.315026 14 120 0.000000 0.496893 0.000000 0 0 / Signed-off-by: NJoe Lawrence <joe.lawrence@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1521484555-8620-2-git-send-email-joe.lawrence@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Song Liu 提交于
When a perf_event is attached to parent cgroup, it should count events for all children cgroups: parent_group <---- perf_event \ - child_group <---- process(es) However, in our tests, we found this perf_event cannot report reliable results. Here is an example case: # create cgroups mkdir -p /sys/fs/cgroup/p/c # start perf for parent group perf stat -e instructions -G "p" # on another console, run test process in child cgroup: stressapptest -s 2 -M 1000 & echo $! > /sys/fs/cgroup/p/c/cgroup.procs # after the test process is done, stop perf in the first console shows <not counted> instructions p The instruction should not be "not counted" as the process runs in the child cgroup. We found this is because perf_event->cgrp and cpuctx->cgrp are not identical, thus perf_event->cgrp are not updated properly. This patch fixes this by updating perf_cgroup properly for ancestor cgroup(s). Reported-by: NEphraim Park <ephiepark@fb.com> Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: <jolsa@redhat.com> Cc: <kernel-team@fb.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20180312165943.1057894-1-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
With the following commit: 33352244 ("jump_label: Explicitly disable jump labels in __init code") ... we explicitly disabled jump labels in __init code, so they could be detected and not warned about in the following commit: dc1dd184 ("jump_label: Warn on failed jump_label patching attempt") In-kernel __exit code has the same issue. It's never used, so it's freed along with the rest of initmem. But jump label entries in __exit code aren't explicitly disabled, so we get the following warning when enabling pr_debug() in __exit code: can't patch jump_label at dmi_sysfs_exit+0x0/0x2d WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0 Fix the warning by disabling all jump labels in initmem (which includes both __init and __exit code). Reported-and-tested-by: NLi Wang <liwang@redhat.com> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Jason Baron <jbaron@akamai.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: dc1dd184 ("jump_label: Warn on failed jump_label patching attempt") Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Since we fixed hash_64() to not suck there is no need to play games to attempt to improve the hash value on 64-bit. Also, since we don't use the bit value for the variables, use hash_ptr() directly. No change in functionality. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: George Spelvin <linux@sciencehorizons.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
There are no users left (everyone got converted to wait_var_event()), remove it. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
As a replacement for the wait_on_atomic_t() API provide the wait_var_event() API. The wait_var_event() API is based on the very same hashed-waitqueue idea, but doesn't care about the type (atomic_t) or the specific condition (atomic_read() == 0). IOW. it's much more widely applicable/flexible. It shares all the benefits/disadvantages of a hashed-waitqueue approach with the existing wait_on_atomic_t/wait_on_bit() APIs. The API is modeled after the existing wait_event() API, but instead of taking a wait_queue_head, it takes an address. This addresses is hashed to obtain a wait_queue_head from the bit_wait_table. Similar to the wait_event() API, it takes a condition expression as second argument and will wait until this expression becomes true. The following are (mostly) identical replacements: wait_on_atomic_t(&my_atomic, atomic_t_wait, TASK_UNINTERRUPTIBLE); wake_up_atomic_t(&my_atomic); wait_var_event(&my_atomic, !atomic_read(&my_atomic)); wake_up_var(&my_atomic); The only difference is that wake_up_var() is an unconditional wakeup and doesn't check the previously hard-coded (atomic_read() == 0) condition here. This is of little concequence, since most callers are already conditional on atomic_dec_and_test() and the ones that are not, are trivial to make so. Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: David Howells <dhowells@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Patrick Bellasi 提交于
The estimated utilization of a task is currently updated every time the task is dequeued. However, to keep overheads under control, PELT signals are effectively updated at maximum once every 1ms. Thus, for really short running tasks, it can happen that their util_avg value has not been updates since their last enqueue. If such tasks are also frequently running tasks (e.g. the kind of workload generated by hackbench) it can also happen that their util_avg is updated only every few activations. This means that updating util_est at every dequeue potentially introduces not necessary overheads and it's also conceptually wrong if the util_avg signal has never been updated during a task activation. Let's introduce a throttling mechanism on task's util_est updates to sync them with util_avg updates. To make the solution memory efficient, both in terms of space and load/store operations, we encode a synchronization flag into the LSB of util_est.enqueued. This makes util_est an even values only metric, which is still considered good enough for its purpose. The synchronization bit is (re)set by __update_load_avg_se() once the PELT signal of a task has been updated during its last activation. Such a throttling mechanism allows to keep under control util_est overheads in the wakeup hot path, thus making it a suitable mechanism which can be enabled also on high-intensity workload systems. Thus, this now switches on by default the estimation utilization scheduler feature. Suggested-by: NChris Redpath <chris.redpath@arm.com> Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-5-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Patrick Bellasi 提交于
When schedutil looks at the CPU utilization, the current PELT value for that CPU is returned straight away. In certain scenarios this can have undesired side effects and delays on frequency selection. For example, since the task utilization is decayed at wakeup time, a long sleeping big task newly enqueued does not add immediately a significant contribution to the target CPU. This introduces some latency before schedutil will be able to detect the best frequency required by that task. Moreover, the PELT signal build-up time is a function of the current frequency, because of the scale invariant load tracking support. Thus, starting from a lower frequency, the utilization build-up time will increase even more and further delays the selection of the actual frequency which better serves the task requirements. In order to reduce these kind of latencies, we integrate the usage of the CPU's estimated utilization in the sugov_get_util function. This allows to properly consider the expected utilization of a CPU which, for example, has just got a big task running after a long sleep period. Ultimately this allows to select the best frequency to run a task right after its wake-up. Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-4-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Patrick Bellasi 提交于
When the scheduler looks at the CPU utilization, the current PELT value for a CPU is returned straight away. In certain scenarios this can have undesired side effects on task placement. For example, since the task utilization is decayed at wakeup time, when a long sleeping big task is enqueued it does not add immediately a significant contribution to the target CPU. As a result we generate a race condition where other tasks can be placed on the same CPU while it is still considered relatively empty. In order to reduce this kind of race conditions, this patch introduces the required support to integrate the usage of the CPU's estimated utilization in the wakeup path, via cpu_util_wake(), as well as in the load-balance path, via cpu_util() which is used by update_sg_lb_stats(). The estimated utilization of a CPU is defined to be the maximum between its PELT's utilization and the sum of the estimated utilization (at previous dequeue time) of all the tasks currently RUNNABLE on that CPU. This allows to properly represent the spare capacity of a CPU which, for example, has just got a big task running since a long sleep period. Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-3-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Patrick Bellasi 提交于
The util_avg signal computed by PELT is too variable for some use-cases. For example, a big task waking up after a long sleep period will have its utilization almost completely decayed. This introduces some latency before schedutil will be able to pick the best frequency to run a task. The same issue can affect task placement. Indeed, since the task utilization is already decayed at wakeup, when the task is enqueued in a CPU, this can result in a CPU running a big task as being temporarily represented as being almost empty. This leads to a race condition where other tasks can be potentially allocated on a CPU which just started to run a big task which slept for a relatively long period. Moreover, the PELT utilization of a task can be updated every [ms], thus making it a continuously changing value for certain longer running tasks. This means that the instantaneous PELT utilization of a RUNNING task is not really meaningful to properly support scheduler decisions. For all these reasons, a more stable signal can do a better job of representing the expected/estimated utilization of a task/cfs_rq. Such a signal can be easily created on top of PELT by still using it as an estimator which produces values to be aggregated on meaningful events. This patch adds a simple implementation of util_est, a new signal built on top of PELT's util_avg where: util_est(task) = max(task::util_avg, f(task::util_avg@dequeue)) This allows to remember how big a task has been reported by PELT in its previous activations via f(task::util_avg@dequeue), which is the new _task_util_est(struct task_struct*) function added by this patch. If a task should change its behavior and it runs longer in a new activation, after a certain time its util_est will just track the original PELT signal (i.e. task::util_avg). The estimated utilization of cfs_rq is defined only for root ones. That's because the only sensible consumer of this signal are the scheduler and schedutil when looking for the overall CPU utilization due to FAIR tasks. For this reason, the estimated utilization of a root cfs_rq is simply defined as: util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued) where: cfs_rq::util_est::enqueued = sum(_task_util_est(task)) for each RUNNABLE task on that root cfs_rq It's worth noting that the estimated utilization is tracked only for objects of interests, specifically: - Tasks: to better support tasks placement decisions - root cfs_rqs: to better support both tasks placement decisions as well as frequencies selection Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Paul Turner <pjt@google.com> Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com> Cc: Steve Muckle <smuckle@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Todd Kjos <tkjos@android.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Viresh Kumar <viresh.kumar@linaro.org> Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-