- 17 2月, 2017 1 次提交
-
-
由 Joel Fernandes 提交于
The comment about ring buffer's organization is outdated and the code sits elsewhere, remove the comment. Link: http://lkml.kernel.org/r/20170217041058.23904-1-joelaf@google.com Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NJoel Fernandes <joelaf@google.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 15 2月, 2017 7 次提交
-
-
由 Masami Hiramatsu 提交于
Since tracing/*probe_events will accept a probe definition up to 4096 - 2 ('\n' and '\0') bytes, it must show 4094 instead of 4096 in warning message. Note that there is one possible case of exceed 4094. If user prepare 4096 bytes null-terminated string and syscall write it with the count == 4095, then it can be accepted. However, if user puts a '\n' after that, it must rejected. So IMHO, the warning message should indicate shorter one, since it is safer. Link: http://lkml.kernel.org/r/148673290462.2579.7966778294009665632.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Wei Yongjun 提交于
In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/20170112135502.28556-1-weiyj.lk@gmail.com Cc: stable@vger.kernel.org Fixes: 81dc9f0e ("tracing: Add tracepoint benchmark tracepoint") Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Arnd Bergmann 提交于
We get a lot of harmless warnings about this header file at W=1 level because of an unusual function declaration: kernel/trace/trace.h:766:1: error: 'inline' is not at beginning of declaration [-Werror=old-style-declaration] This moves the inline statement where it normally belongs, avoiding the warning. Link: http://lkml.kernel.org/r/20170123122521.3389010-1-arnd@arndb.de Fixes: 4046bf02 ("ftrace: Expose ftrace_hash_empty and ftrace_lookup_ip") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Jason Baron 提交于
The static_key->next field goes mostly unused. The field is used for associating module uses with a static key. Most uses of struct static_key define a static key in the core kernel and make use of it entirely within the core kernel, or define the static key in a module and make use of it only from within that module. In fact, of the ~3,000 static keys defined, I found only about 5 or so that did not fit this pattern. Thus, we can remove the static_key->next field entirely and overload the static_key->entries field. That is, when all the static_key uses are contained within the same module, static_key->entries continues to point to those uses. However, if the static_key uses are not contained within the module where the static_key is defined, then we allocate a struct static_key_mod, store a pointer to the uses within that struct static_key_mod, and have the static key point at the static_key_mod. This does incur some extra memory usage when a static_key is used in a module that does not define it, but since there are only a handful of such cases there is a net savings. In order to identify if the static_key->entries pointer contains a struct static_key_mod or a struct jump_entry pointer, bit 1 of static_key->entries is set to 1 if it points to a struct static_key_mod and is 0 if it points to a struct jump_entry. We were already using bit 0 in a similar way to store the initial value of the static_key. This does mean that allocations of struct static_key_mod and that the struct jump_entry tables need to be at least 4-byte aligned in memory. As far as I can tell all arches meet this criteria. For my .config, the patch increased the text by 778 bytes, but reduced the data + bss size by 14912, for a net savings of 14,134 bytes. text data bss dec hex filename 8092427 5016512 790528 13899467 d416cb vmlinux.pre 8093205 5001600 790528 13885333 d3df95 vmlinux.post Link: http://lkml.kernel.org/r/1486154544-4321-1-git-send-email-jbaron@akamai.com Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Masami Hiramatsu 提交于
Show "trace_probe:", "trace_kprobe:" and "trace_uprobe:" headers for each warning/error/info message. This will help people to notice that kprobe/uprobe events caused those messages. Link: http://lkml.kernel.org/r/148646647813.24658.16705315294927615333.stgit@devboxSigned-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Luiz Capitulino 提交于
The ftrace hwlat does support a cpumask. Link: http://lkml.kernel.org/r/20170213122517.6e211955@redhat.comSigned-off-by: NLuiz Capitulino <lcapitulino@redhat.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The code in traceprobe_probes_write() reads up to 4096 bytes from userpace for each line. If userspace passes in several lines to execute, the code will do a large read for each line, even though, it is highly likely that the first read from userspace received all of the lines at once. I changed the logic to do a single read from userspace, and to only read from userspace again if not all of the read from userspace made it in. I tested this by adding printk()s and writing files that would test -1, ==, and +1 the buffer size, to make sure that there's no overflows and that if a single line is written with +1 the buffer size, that it fails properly. Link: http://lkml.kernel.org/r/20170209180458.5c829ab2@gandalf.local.homeAcked-by: NMasami Hiramatsu <mhiramat@kernel.org> Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 11 2月, 2017 1 次提交
-
-
由 Steven Rostedt (VMware) 提交于
The GLOB operation "~" should be able to work with the COMM filter key in order to trace programs with a glob. For example echo 'COMM ~ "systemd*"' > events/syscalls/filter Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 03 2月, 2017 8 次提交
-
-
由 Steven Rostedt (VMware) 提交于
Currently, only one function can be written to set_graph_function and set_graph_notrace. The last function in the list will have saved, even though other functions will be added then removed. Change the behavior to be the same as set_ftrace_function as to allow multiple functions to be written. If any one fails, none of them will be added. The addition of the functions are done at the end when the file is closed. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
The hashs ftrace_graph_hash and ftrace_graph_notrace_hash are modified within the graph_lock being held. Holding a pointer to them and passing them along can lead to a use of a stale pointer (fgd->hash). Move assigning the pointer and its use to within the holding of the lock. Note, it's an rcu_sched protected data, and other instances of referencing them are done with preemption disabled. But the file manipuation code must be protected by the lock. The fgd->hash pointer is set to NULL when the lock is being released. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
trace_parser_put() simply frees the allocated parser buffer. But it does not reset the pointer that was freed. This means that if trace_parser_put() is called on the same parser more than once, it will corrupt the allocation system. Setting parser->buffer to NULL after free allows it to be called more than once without any ill effect. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Since reading the set_graph_functions uses seq functions, which sets the file->private_data pointer to a seq_file descriptor. On writes the ftrace_graph_data descriptor is set to file->private_data. But if the file is opened for RDWR, the ftrace_graph_write() will incorrectly use the file->private_data descriptor instead of ((struct seq_file *)file->private_data)->private pointer, and this can crash the kernel. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
fgd->hash is saved and then freed, but is never reset to either ftrace_graph_hash nor ftrace_graph_notrace_hash. But if multiple writes are performed, then the freed hash could be accessed again. # cd /sys/kernel/debug/tracing # head -1000 available_filter_functions > /tmp/funcs # cat /tmp/funcs > set_graph_function Causes: general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC Modules linked in: [...] CPU: 2 PID: 1337 Comm: cat Not tainted 4.10.0-rc2-test-00010-g6b052e9 #32 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012 task: ffff880113a12200 task.stack: ffffc90001940000 RIP: 0010:free_ftrace_hash+0x7c/0x160 RSP: 0018:ffffc90001943db0 EFLAGS: 00010246 RAX: 6b6b6b6b6b6b6b6b RBX: 6b6b6b6b6b6b6b6b RCX: 6b6b6b6b6b6b6b6b RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffff8800ce1e1d40 RBP: ffff8800ce1e1d50 R08: 0000000000000000 R09: 0000000000006400 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: ffff8800ce1e1d40 R14: 0000000000004000 R15: 0000000000000001 FS: 00007f9408a07740(0000) GS:ffff88011e500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000aee1f0 CR3: 0000000116bb4000 CR4: 00000000001406e0 Call Trace: ? ftrace_graph_write+0x150/0x190 ? __vfs_write+0x1f6/0x210 ? __audit_syscall_entry+0x17f/0x200 ? rw_verify_area+0xdb/0x210 ? _cond_resched+0x2b/0x50 ? __sb_start_write+0xb4/0x130 ? vfs_write+0x1c8/0x330 ? SyS_write+0x62/0xf0 ? do_syscall_64+0xa3/0x1b0 ? entry_SYSCALL64_slow_path+0x25/0x25 Code: 01 48 85 db 0f 84 92 00 00 00 b8 01 00 00 00 d3 e0 85 c0 7e 3f 83 e8 01 48 8d 6f 10 45 31 e4 4c 8d 34 c5 08 00 00 00 49 8b 45 08 <4a> 8b 34 20 48 85 f6 74 13 48 8b 1e 48 89 ef e8 20 fa ff ff 48 RIP: free_ftrace_hash+0x7c/0x160 RSP: ffffc90001943db0 ---[ end trace 999b48216bf4b393 ]--- Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
When the set_graph_function or set_graph_notrace contains no records, a banner is displayed of either "#### all functions enabled ####" or "#### all functions disabled ####" respectively. To tell the seq operations to do this, (void *)1 is passed as a return value. Instead of using a hardcoded meaningless variable, define it as a macro. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
This is a micro-optimization, but as it has to deal with a fast path of the function tracer, these optimizations can be noticed. The ftrace_lookup_ip() returns true if the given ip is found in the hash. If it's not found or the hash is NULL, it returns false. But there's some cases that a NULL hash is a true, and the ftrace_hash_empty() is tested before calling ftrace_lookup_ip() in those cases. But as ftrace_lookup_ip() tests that first, that adds a few extra unneeded instructions in those cases. A new static "always_inlined" function is created that does not perform the hash empty test. This most only be used by callers that do the check first anyway, as an empty or NULL hash could cause a crash if a lookup is performed on it. Also add kernel doc for the ftrace_lookup_ip() main function. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Replace the couple of use cases that has small logic to produce the ftrace function key id with a helper function. No need for duplicate code. Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 21 1月, 2017 3 次提交
-
-
由 Namhyung Kim 提交于
Use ftrace_hash instead of a static array of a fixed size. This is useful when a graph filter pattern matches to a large number of functions. Now hash lookup is done with preemption disabled to protect from the hash being changed/freed. Link: http://lkml.kernel.org/r/20170120024447.26097-3-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
It will be used when checking graph filter hashes later. Link: http://lkml.kernel.org/r/20170120024447.26097-2-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> [ Moved ftrace_hash dec and functions outside of FUNCTION_GRAPH define ] Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Namhyung Kim 提交于
The __ftrace_hash_move() is to allocates properly-sized hash and move entries in the src ftrace_hash. It will be used to set function graph filters which has nothing to do with the dyn_ftrace records. Link: http://lkml.kernel.org/r/20170120024447.26097-1-namhyung@kernel.orgSigned-off-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 19 1月, 2017 2 次提交
-
-
由 Steven Rostedt (VMware) 提交于
The unlikely/likely branch profiler now gets called even if the if statement is a constant (always goes in one direction without a compare). Add a value to denote this in the likely/unlikely tracer as well. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Steven Rostedt (VMware) 提交于
Now that constants are traced, it is useful to see the number of constants that are traced in the likely/unlikely profiler in order to know if they should be ignored or not. The likely/unlikely will display a number after the "correct" number if a "constant" count exists. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 18 1月, 2017 2 次提交
-
-
由 Steven Rostedt (VMware) 提交于
When running the likely/unlikely profiler, one of the results did not look accurate. It noted that the unlikely() in link_path_walk() was 100% incorrect. When I added a trace_printk() to see what was happening there, it became 80% correct! Looking deeper into what whas happening, I found that gcc split that if statement into two paths. One where the if statement became a constant, the other path a variable. The other path had the if statement always hit (making the unlikely there, always false), but since the #define unlikely() has: #define unlikely() (__builtin_constant_p(x) ? !!(x) : __branch_check__(x, 0)) Where constants are ignored by the branch profiler, the "constant" path made by the compiler was ignored, even though it was hit 80% of the time. By just passing the constant value to the __branch_check__() function and tracing it out of line (as always correct, as likely/unlikely isn't a factor for constants), then we get back the accurate readings of branches that were optimized by gcc causing part of the execution to become constant. Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Kenny Yu 提交于
Previously, `create_trace_uprobe` found the *first* occurence of the ':' character when parsing `PATH:OFFSET` for a uprobe. However, if the path contains a ':' character, then the function would parse the path incorrectly. Even worse, if the path does not exist, the subsequent call to `kern_path()` would set `ret` to `ENOENT`, leading to very cryptic errno values in user space. The fix is to find the *last* occurence of ':'. How to repro:: The write fails with "No such file or directory", suggesting incorrectly that the `uprobe_events` file does not exist. $ mkdir testing && cd testing $ cp /bin/bash . $ cp /bin/bash ./bash:with:colon $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this doesn't -bash: echo: write error: No such file or directory With the patch: $ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this still works $ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this works now too! $ cat /sys/kernel/debug/tracing/uprobe_events p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006 p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006 Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.comSigned-off-by: NKenny Yu <kennyyu@fb.com> Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
- 27 12月, 2016 1 次提交
-
-
由 Thomas Gleixner 提交于
The attempt to prevent overwriting an active state resulted in a disaster which effectively disables all dynamically allocated hotplug states. Cleanup the mess. Fixes: dc280d93 ("cpu/hotplug: Prevent overwriting of callbacks") Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Reported-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 12月, 2016 2 次提交
-
-
由 Thomas Gleixner 提交于
ktime_set(S,N) was required for the timespec storage type and is still useful for situations where a Seconds and Nanoseconds part of a time value needs to be converted. For anything where the Seconds argument is 0, this is pointless and can be replaced with a simple assignment. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
由 Thomas Gleixner 提交于
ktime is a union because the initial implementation stored the time in scalar nanoseconds on 64 bit machine and in a endianess optimized timespec variant for 32bit machines. The Y2038 cleanup removed the timespec variant and switched everything to scalar nanoseconds. The union remained, but become completely pointless. Get rid of the union and just keep ktime_t as simple typedef of type s64. The conversion was done with coccinelle and some manual mopping up. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
-
- 25 12月, 2016 4 次提交
-
-
由 Thomas Gleixner 提交于
There is no point in having an extra type for extra confusion. u64 is unambiguous. Conversion was done with the following coccinelle script: @rem@ @@ -typedef u64 cycle_t; @fix@ typedef cycle_t; @@ -cycle_t +u64 Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: John Stultz <john.stultz@linaro.org>
-
由 Thomas Gleixner 提交于
hotcpu_notifier(), cpu_notifier(), __hotcpu_notifier(), __cpu_notifier(), register_hotcpu_notifier(), register_cpu_notifier(), __register_hotcpu_notifier(), __register_cpu_notifier(), unregister_hotcpu_notifier(), unregister_cpu_notifier(), __unregister_hotcpu_notifier(), __unregister_cpu_notifier() are unused now. Remove them and all related code. Remove also the now pointless cpu notifier error injection mechanism. The states can be executed step by step and error rollback is the same as cpu down, so any state transition can be tested w/o requiring the notifier error injection. Some CPU hotplug states are kept as they are (ab)used for hotplug state tracking. Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Developers manage to overwrite states blindly without thought. That's fatal and hard to debug. Add sanity checks to make it fail. This requries to restructure the code so that the dynamic state allocation happens in the same lock protected section as the actual store. Otherwise the previous assignment of 'Reserved' to the name field would trigger the overwrite check. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Linus Torvalds 提交于
This was entirely automated, using the script by Al: PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: NAl Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 23 12月, 2016 1 次提交
-
-
由 Al Viro 提交于
... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 12月, 2016 3 次提交
-
-
由 Boris Ostrovsky 提交于
When ivoked with CPUHP_AP_ONLINE_DYN state __cpuhp_setup_state() is expected to return positive value which is the hotplug state that the routine assigns. Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Alexander Popov 提交于
Subtract KASLR offset from the kernel addresses reported by kcov. Tested on x86_64 and AArch64 (Hikey LeMaker). Link: http://lkml.kernel.org/r/1481417456-28826-3-git-send-email-alex.popov@linux.comSigned-off-by: NAlexander Popov <alex.popov@linux.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Rob Herring <robh@kernel.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Jon Masters <jcm@redhat.com> Cc: David Daney <david.daney@cavium.com> Cc: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Nicolai Stange <nicstange@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Alexander Popov <alex.popov@linux.com> Cc: syzkaller <syzkaller@googlegroups.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mimi Zohar 提交于
The TPM PCRs are only reset on a hard reboot. In order to validate a TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list of the running kernel must be saved and restored on boot. This patch uses the kexec buffer passing mechanism to pass the serialized IMA binary_runtime_measurements to the next kernel. Link: http://lkml.kernel.org/r/1480554346-29071-7-git-send-email-zohar@linux.vnet.ibm.comSigned-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com> Acked-by: NDmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Andreas Steffen <andreas.steffen@strongswan.org> Cc: Josh Sklar <sklar@linux.vnet.ibm.com> Cc: Dave Young <dyoung@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 12月, 2016 4 次提交
-
-
由 Marcin Nowakowski 提交于
Commit: 72e6ae28 ('ARM: 8043/1: uprobes need icache flush after xol write' ... has introduced an arch-specific method to ensure all caches are flushed appropriately after an instruction is written to an XOL page. However, when the XOL area is created and the out-of-line breakpoint instruction is copied, caches are not flushed at all and stale data may be found in icache. Replace a simple copy_to_page() with arch_uprobe_copy_ixol() to allow the arch to ensure all caches are updated accordingly. This change fixes uprobes on MIPS InterAptiv (tested on Creator Ci40). Signed-off-by: NMarcin Nowakowski <marcin.nowakowski@imgtec.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Victor Kamensky <victor.kamensky@linaro.org> Cc: linux-mips@linux-mips.org Link: http://lkml.kernel.org/r/1481625657-22850-1-git-send-email-marcin.nowakowski@imgtec.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Daniel Borkmann 提交于
Martin reported a verifier issue that hit the BUG_ON() for his test case in the mark_reg_unknown_value() function: [ 202.861380] kernel BUG at kernel/bpf/verifier.c:467! [...] [ 203.291109] Call Trace: [ 203.296501] [<ffffffff811364d5>] mark_map_reg+0x45/0x50 [ 203.308225] [<ffffffff81136558>] mark_map_regs+0x78/0x90 [ 203.320140] [<ffffffff8113938d>] do_check+0x226d/0x2c90 [ 203.331865] [<ffffffff8113a6ab>] bpf_check+0x48b/0x780 [ 203.343403] [<ffffffff81134c8e>] bpf_prog_load+0x27e/0x440 [ 203.355705] [<ffffffff8118a38f>] ? handle_mm_fault+0x11af/0x1230 [ 203.369158] [<ffffffff812d8188>] ? security_capable+0x48/0x60 [ 203.382035] [<ffffffff811351a4>] SyS_bpf+0x124/0x960 [ 203.393185] [<ffffffff810515f6>] ? __do_page_fault+0x276/0x490 [ 203.406258] [<ffffffff816db320>] entry_SYSCALL_64_fastpath+0x13/0x94 This issue got uncovered after the fix in a08dd0da ("bpf: fix regression on verifier pruning wrt map lookups"). The reason why it wasn't noticed before was, because as mentioned in a08dd0da, mark_map_regs() was doing the id matching incorrectly based on the uncached regs[regno].id. So, in the first loop, we walked all regs and as soon as we found regno == i, then this reg's id was cleared when calling mark_reg_unknown_value() thus that every subsequent register was probed against id of 0 (which, in combination with the PTR_TO_MAP_VALUE_OR_NULL type is an invalid condition that no other register state can hold), and therefore wasn't type transitioned such as in the spilled register case for the second loop. Now since that got fixed, it turned out that 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") used mark_reg_unknown_value() incorrectly for the spilled regs, and thus hitting the BUG_ON() in some cases due to regno >= MAX_BPF_REG. Although spilled regs have the same type as the non-spilled regs for the verifier state, that is, struct bpf_reg_state, they are semantically different from the non-spilled regs. In other words, there can be up to 64 (MAX_BPF_STACK / BPF_REG_SIZE) spilled regs in the stack, for example, register R<x> could have been spilled by the program to stack location X, Y, Z, and in mark_map_regs() we need to scan these stack slots of type STACK_SPILL for potential registers that we have to transition from PTR_TO_MAP_VALUE_OR_NULL. Therefore, depending on the location, the spilled_regs regno can be a lot higher than just MAX_BPF_REG's value since we operate on stack instead. The reset in mark_reg_unknown_value() itself is just fine, only that the BUG_ON() was inappropriate for this. Fix it by making a __mark_reg_unknown_value() version that can be called from mark_map_reg() generically; we know for the non-spilled case that the regno is always < MAX_BPF_REG anyway. Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Reported-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Commit aaac3ba9 ("bpf: charge user for creation of BPF maps and programs") made a wrong assumption of charging against prog->pages. Unlike map->pages, prog->pages are still subject to change when we need to expand the program through bpf_prog_realloc(). This can for example happen during verification stage when we need to expand and rewrite parts of the program. Should the required space cross a page boundary, then prog->pages is not the same anymore as its original value that we used to bpf_prog_charge_memlock() on. Thus, we'll hit a wrap-around during bpf_prog_uncharge_memlock() when prog is freed eventually. I noticed this that despite having unlimited memlock, programs suddenly refused to load with EPERM error due to insufficient memlock. There are two ways to fix this issue. One would be to add a cached variable to struct bpf_prog that takes a snapshot of prog->pages at the time of charging. The other approach is to also account for resizes. I chose to go with the latter for a couple of reasons: i) We want accounting rather to be more accurate instead of further fooling limits, ii) adding yet another page counter on struct bpf_prog would also be a waste just for this purpose. We also do want to charge as early as possible to avoid going into the verifier just to find out later on that we crossed limits. The only place that needs to be fixed is bpf_prog_realloc(), since only here we expand the program, so we try to account for the needed delta and should we fail, call-sites check for outcome anyway. On cBPF to eBPF migrations, we don't grab a reference to the user as they are charged differently. With that in place, my test case worked fine. Fixes: aaac3ba9 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Geert rightfully complained that 7bd509e3 ("bpf: add prog_digest and expose it via fdinfo/netlink") added a too large allocation of variable 'raw' from bss section, and should instead be done dynamically: # ./scripts/bloat-o-meter kernel/bpf/core.o.1 kernel/bpf/core.o.2 add/remove: 3/0 grow/shrink: 0/0 up/down: 33291/0 (33291) function old new delta raw - 32832 +32832 [...] Since this is only relevant during program creation path, which can be considered slow-path anyway, lets allocate that dynamically and be not implicitly dependent on verifier mutex. Move bpf_prog_calc_digest() at the beginning of replace_map_fd_with_map_ptr() and also error handling stays straight forward. Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 12月, 2016 1 次提交
-
-
由 Daniel Borkmann 提交于
Commit 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") introduced a regression where existing programs stopped loading due to reaching the verifier's maximum complexity limit, whereas prior to this commit they were loading just fine; the affected program has roughly 2k instructions. What was found is that state pruning couldn't be performed effectively anymore due to mismatches of the verifier's register state, in particular in the id tracking. It doesn't mean that 57a09bf0 is incorrect per se, but rather that verifier needs to perform a lot more work for the same program with regards to involved map lookups. Since commit 57a09bf0 is only about tracking registers with type PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers until they are promoted through pattern matching with a NULL check to either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the id becomes irrelevant for the transitioned types. For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(), but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even transferred further into other types that don't make use of it. Among others, one example is where UNKNOWN_VALUE is set on function call return with RET_INTEGER return type. states_equal() will then fall through the memcmp() on register state; note that the second memcmp() uses offsetofend(), so the id is part of that since d2a4dd37 ("bpf: fix state equivalence"). But the bisect pointed already to 57a09bf0, where we really reach beyond complexity limit. What I found was that states_equal() often failed in this case due to id mismatches in spilled regs with registers in type PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform a memcmp() on their reg state and don't have any other optimizations in place, therefore also id was relevant in this case for making a pruning decision. We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE. For the affected program, it resulted in a ~17 fold reduction of complexity and let the program load fine again. Selftest suite also runs fine. The only other place where env->id_gen is used currently is through direct packet access, but for these cases id is long living, thus a different scenario. Also, the current logic in mark_map_regs() is not fully correct when marking NULL branch with UNKNOWN_VALUE. We need to cache the destination reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE, it's id is reset and any subsequent registers that hold the original id and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE anymore, since mark_map_reg() reuses the uncached regs[regno].id that was just overridden. Note, we don't need to cache it outside of mark_map_regs(), since it's called once on this_branch and the other time on other_branch, which are both two independent verifier states. A test case for this is added here, too. Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NThomas Graf <tgraf@suug.ch> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-