提交 · 35abb67de744b5dbaec54381f2f9e0246089331d · openanolis / cloud-kernel

20 6月, 2016 7 次提交

tracing: expose current->comm to [ku]probe events · 35abb67d

由 Omar Sandoval 提交于 6月 08, 2016

ftrace is very quick to give up on saving the task command line (see
`trace_save_cmdline()`). The workaround for events which really care
about the command line is to explicitly assign it as part of the entry.
However, this doesn't work for kprobe events, as there's no
straightforward way to get access to current->comm. Add a kprobe/uprobe
event variable $comm which provides exactly that.

Link: http://lkml.kernel.org/r/f59b472033b943a370f5f48d0af37698f409108f.1465435894.git.osandov@fb.comAcked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

35abb67d

ftrace: Have set_ftrace_pid use the bitmap like events do · 345ddcc8

由 Steven Rostedt (Red Hat) 提交于 4月 22, 2016

Convert set_ftrace_pid to use the bitmap like set_event_pid does. This
allows for instances to use the pid filtering as well, and will allow for
function-fork option to set if the children of a traced function should be
traced or not.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

345ddcc8

tracing: Move pid_list write processing into its own function · 76c813e2

由 Steven Rostedt (Red Hat) 提交于 4月 21, 2016

The addition of PIDs into a pid_list via the write operation of
set_event_pid is a bit complex. The same operation will be needed for
function tracing pids. Move the code into its own generic function in
trace.c, so that we can avoid duplication of this code.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

76c813e2

tracing: Move the pid_list seq_file functions to be global · 5cc8976b

由 Steven Rostedt (Red Hat) 提交于 4月 20, 2016

To allow other aspects of ftrace to use the pid_list logic, we need to reuse
the seq_file functions. Making the generic part into functions that can be
called by other files will help in this regard.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5cc8976b

tracing: Move filtered_pid helper functions into trace.c · d8275c45

由 Steven Rostedt 提交于 4月 14, 2016

As the filtered_pid functions are going to be used by function tracer as
well as trace_events, move the code into the generic trace.c file.

The functions moved are:

 trace_find_filtered_pid()
 trace_ignore_this_task()
 trace_filter_add_remove_task()

Kernel Doc text was also added.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d8275c45

tracing: Make the pid filtering helper functions global · 4e267db1

由 Steven Rostedt 提交于 4月 14, 2016

Make the functions used for pid filtering global for tracing, such that the
function tracer can use the pid code as well.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4e267db1

tracing: Handle NULL formats in hold_module_trace_bprintk_format() · 70c8217a

由 Steven Rostedt (Red Hat) 提交于 6月 17, 2016

If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.

The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.

This wasn't an issue till 3debb0a9 ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).

The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.

Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.comReported-by: Nxingzhen <zhengjun.xing@intel.com>
Acked-by: NNamhyung Kim <namhyung@kernel.org>
Fixes: 3debb0a9 ("tracing: Fix trace_printk() to print when not using bprintk()")
Cc: stable@vger.kernel.org # v3.5+
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

70c8217a

08 6月, 2016 1 次提交

bpf, trace: use READ_ONCE for retrieving file ptr · 5b6c1b4d

由 Daniel Borkmann 提交于 6月 04, 2016

In bpf_perf_event_read() and bpf_perf_event_output(), we must use
READ_ONCE() for fetching the struct file pointer, which could get
updated concurrently, so we must prevent the compiler from potential
refetching.

We already do this with tail calls for fetching the related bpf_prog,
but not so on stored perf events. Semantics for both are the same
with regards to updates.

Fixes: a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
Fixes: 35578d79 ("bpf: Implement function bpf_perf_event_read() that get the selected hardware PMU conuter")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b6c1b4d

21 5月, 2016 1 次提交

ftrace: Don't disable irqs when taking the tasklist_lock read_lock · 6112a300

由 Soumya PN 提交于 5月 17, 2016

In ftrace.c inside the function alloc_retstack_tasklist() (which will be
invoked when function_graph tracing is on) the tasklist_lock is being
held as reader while iterating through a list of threads. Here the lock
is being held as reader with irqs disabled. The tasklist_lock is never
write_locked in interrupt context so it is safe to not disable interrupts
for the duration of read_lock in this block which, can be significant,
given the block of code iterates through all threads. Hence changing the
code to call read_lock() and read_unlock() instead of read_lock_irqsave()
and read_unlock_irqrestore().

A similar change was made in commits: 8063e41d ("tracing: Change
syscall_*regfunc() to check PF_KTHREAD and use for_each_process_thread()")'
and 3472eaa1 ("sched: normalize_rt_tasks(): Don't use _irqsave for
tasklist_lock, use task_rq_lock()")'

Link: http://lkml.kernel.org/r/1463500874-77480-1-git-send-email-soumya.p.n@hpe.comSigned-off-by: NSoumya PN <soumya.p.n@hpe.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6112a300

14 5月, 2016 1 次提交

ring-buffer: Prevent overflow of size in ring_buffer_resize() · 59643d15

由 Steven Rostedt (Red Hat) 提交于 5月 13, 2016

If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.

Here's the details:

  # echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb

tracing_entries_write() processes this and converts kb to bytes.

 18014398509481980 << 10 = 18446744073709547520

and this is passed to ring_buffer_resize() as unsigned long size.

 size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);

Where DIV_ROUND_UP(a, b) is (a + b - 1)/b

BUF_PAGE_SIZE is 4080 and here

 18446744073709547520 + 4080 - 1 = 18446744073709551599

where 18446744073709551599 is still smaller than 2^64

 2^64 - 18446744073709551599 = 17

But now 18446744073709551599 / 4080 = 4521260802379792

and size = size * 4080 = 18446744073709551360

This is checked to make sure its still greater than 2 * 4080,
which it is.

Then we convert to the number of buffer pages needed.

 nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)

but this time size is 18446744073709551360 and

 2^64 - (18446744073709551360 + 4080 - 1) = -3823

Thus it overflows and the resulting number is less than 4080, which makes

  3823 / 4080 = 0

an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.

There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.

Cc: stable@vger.kernel.org # 3.5+
Fixes: 83f40318 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

59643d15

13 5月, 2016 1 次提交

ring-buffer: Use long for nr_pages to avoid overflow failures · 9b94a8fb

由 Steven Rostedt (Red Hat) 提交于 5月 12, 2016

The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.

For example, the following will cause the ring buffer to crash:

 # cd /sys/kernel/debug/tracing
 # echo 10 > buffer_size_kb
 # echo 8556384240 > buffer_size_kb

Then you get the warning of:

 WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260

Which is:

  RB_WARN_ON(cpu_buffer, nr_removed);

Note each ring buffer page holds 4080 bytes.

This is because:

 1) 10 causes the ring buffer to have 3 pages.
    (10kb requires 3 * 4080 pages to hold)

 2) (2^31 / 2^10  + 1) * 4080 = 8556384240
    The value written into buffer_size_kb is shifted by 10 and then passed
    to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760

 3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
    which is 4080. 8761737461760 / 4080 = 2147484672

 4) nr_pages is subtracted from the current nr_pages (3) and we get:
    2147484669. This value is saved in a signed integer nr_pages_to_update

 5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
    turns into the value of -2147482627

 6) As the value is a negative number, in update_pages_handler() it is
    negated and passed to rb_remove_pages() and 2147482627 pages will
    be removed, which is much larger than 3 and it causes the warning
    because not all the pages asked to be removed were removed.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001

Cc: stable@vger.kernel.org # 2.6.28+
Fixes: 7a8e76a3 ("tracing: unified trace buffer")
Reported-by: NHao Qin <QEver.cn@gmail.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9b94a8fb

10 5月, 2016 2 次提交

blktrace: add missed mask name · 8d1547e0

由 Shaohua Li 提交于 5月 09, 2016

BLK_TC_NOTIFY is missed in mask_maps, so we can't print out notify or
set mask with 'notify' name.
Signed-off-by: NShaohua Li <shli@fb.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

8d1547e0

blktrace: delete garbage for message trace · b7d7641e

由 Shaohua Li 提交于 5月 09, 2016

commit f4a1d08c introduces a regression. Originally for
BLK_TN_MESSAGE, we add message in trace and return. The commit ignores
the early return and add garbage info.
Signed-off-by: NShaohua Li <shli@fb.com>
Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

b7d7641e

04 5月, 2016 2 次提交

tracing: Use temp buffer when filtering events · 0fc1b09f

由 Steven Rostedt (Red Hat) 提交于 5月 03, 2016

Filtering of events requires the data to be written to the ring buffer
before it can be decided to filter or not. This is because the parameters of
the filter are based on the result that is written to the ring buffer and
not on the parameters that are passed into the trace functions.

The ftrace ring buffer is optimized for writing into the ring buffer and
committing. The discard procedure used when filtering decides the event
should be discarded is much more heavy weight. Thus, using a temporary
filter when filtering events can speed things up drastically.

Without a temp buffer we have:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.790706626 seconds time elapsed ( +-  0.71% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.566904059 seconds time elapsed ( +-  0.27% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.690598511 seconds time elapsed ( +-  0.19% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.707486364 seconds time elapsed ( +-  0.30% )

The first run above is without any tracing, just to get a based figure.
hackbench takes ~0.79 seconds to run on the system.

The second run enables tracing all events where nothing is filtered. This
increases the time by 100% and hackbench takes 1.57 seconds to run.

The third run filters all events where the preempt count will equal "20"
(this should never happen) thus all events are discarded. This takes 1.69
seconds to run. This is 10% slower than just committing the events!

The last run enables all events and filters where the filter will commit all
events, and this takes 1.70 seconds to run. The filtering overhead is
approximately 10%. Thus, the discard and commit of an event from the ring
buffer may be about the same time.

With this patch, the numbers change:

 # trace-cmd start -p nop
 # perf stat -r 10 hackbench 50
       0.778233033 seconds time elapsed ( +-  0.38% )

 # trace-cmd start -e all
 # perf stat -r 10 hackbench 50
       1.582102692 seconds time elapsed ( +-  0.28% )

 # trace-cmd start -e all -f 'common_preempt_count==20'
 # perf stat -r 10 hackbench 50
       1.309230710 seconds time elapsed ( +-  0.22% )

 # trace-cmd start -e all -f 'common_preempt_count!=20'
 # perf stat -r 10 hackbench 50
       1.786001924 seconds time elapsed ( +-  0.20% )

The first run is again the base with no tracing.

The second run is all tracing with no filtering. It is a little slower, but
that may be well within the noise.

The third run shows that discarding all events only took 1.3 seconds. This
is a speed up of 23%! The discard is much faster than even the commit.

The one downside is shown in the last run. Events that are not discarded by
the filter will take longer to add, this is due to the extra copy of the
event.

Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0fc1b09f

tracing: Don't display trigger file for events that can't be enabled · 854145e0

由 Chunyu Hu 提交于 5月 03, 2016

Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.

Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.

Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com

Cc: stable@vger.kernel.org # 3.14+
Fixes: 85f2b082 ("tracing: Add basic event trigger framework")
Signed-off-by: NChunyu Hu <chuhu@redhat.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

854145e0

03 5月, 2016 1 次提交
- S
  tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic · dcb0b557
  由 Steven Rostedt (Red Hat) 提交于 5月 02, 2016
```
Nothing sets TRACE_EVENT_FL_USE_CALL_FILTER anymore. Remove it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
```
  dcb0b557
30 4月, 2016 6 次提交

S
tracing: Remove unused function trace_current_buffer_lock_reserve() · 904d1857
由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016
```
trace_current_buffer_lock_reserve() has no more users. Remove it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
```
904d1857

tracing: Remove one use of trace_current_buffer_lock_reserve() · 9b9db275

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016

The only user of trace_current_buffer_lock_reserve() is in the boot up self
tests. Restructure the code a little to have that code use what everything
else uses: trace_event_buffer_lock_reserve().
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9b9db275

tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL · 33fddff2

由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016

There's no real difference between trace_buffer_unlock_commit() and
trace_buffer_unlock_commit_regs() except that the former passes NULL to
ftrace_stack_trace() instead of regs. Have the former be a static inline of
the latter which passes NULL for regs.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

33fddff2

S
tracing: Remove unused function trace_current_buffer_discard_commit() · a9fe48dc
由 Steven Rostedt (Red Hat) 提交于 4月 29, 2016
```
The function trace_current_buffer_discard_commit() has no callers, remove
it.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
```
a9fe48dc

tracing: Move trace_buffer_unlock_commit{_regs}() to local header · fa66ddb8

由 Steven Rostedt (Red Hat) 提交于 4月 28, 2016

The functions trace_buffer_unlock_commit() and the _regs() version are only
used within the kernel/trace directory. Move them to the local header and
remove the export as well.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

fa66ddb8

tracing: Fold filter_check_discard() into its only user · 9cbb1506

由 Steven Rostedt (Red Hat) 提交于 4月 27, 2016

The function filter_check_discard() is small and only called by one user,
its code can be folded into that one caller and make the code a bit less
comlplex.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9cbb1506

27 4月, 2016 4 次提交

tracing: Make filter_check_discard() local · 65da9a0a

由 Steven Rostedt (Red Hat) 提交于 4月 27, 2016

Nothing outside of the tracing directory calls filter_check_discard() or
check_filter_check_discard(). They should not be called by modules. Move
their prototypes into the local tracing header and remove their
EXPORT_SYMBOL() macros.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

65da9a0a

tracing: Move event_trigger_unlock_commit{_regs}() to local header · dad56ee7

由 Steven Rostedt (Red Hat) 提交于 4月 26, 2016

The functions event_trigger_unlock_commit() and
event_trigger_unlock_commit_regs() are no longer used outside the tracing
system. Move them out of the generic headers and into the local one.

Along with __event_trigger_test_discard() that is only used by them.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

dad56ee7

ftrace: Match dot symbols when searching functions on ppc64 · 7132e2d6

由 Thiago Jung Bauermann 提交于 4月 25, 2016

In the ppc64 big endian ABI, function symbols point to function
descriptors. The symbols which point to the function entry points
have a dot in front of the function name. Consequently, when the
ftrace filter mechanism searches for the symbol corresponding to
an entry point address, it gets the dot symbol.

As a result, ftrace filter users have to be aware of this ABI detail on
ppc64 and prepend a dot to the function name when setting the filter.

The perf probe command insulates the user from this by ignoring the dot
in front of the symbol name when matching function names to symbols,
but the sysfs interface does not. This patch makes the ftrace filter
mechanism do the same when searching symbols.

Fixes the following failure in ftracetest's kprobe_ftrace.tc:

  .../kprobe_ftrace.tc: line 9: echo: write error: Invalid argument

That failure is on this line of kprobe_ftrace.tc:

  echo _do_fork > set_ftrace_filter

This is because there's no _do_fork entry in the functions list:

  # cat available_filter_functions | grep _do_fork
  ._do_fork

This change introduces no regressions on the perf and ftracetest
testsuite results.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: NThiago Jung Bauermann <bauerman@linux.vnet.ibm.com>
Acked-by: NSteven Rostedt <rostedt@goodmis.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>

7132e2d6

tracing: Don't use the address of the buffer array name in copy_from_user · 4afe6495

由 Wang Xiaoqiang 提交于 4月 18, 2016

With the following code snippet:

    ...
    char buf[64];
    ...
    if (copy_from_user(&buf, ubuf, cnt))
    ...

Even though the value of "&buf" equals "buf", but there is no need
to get the address of the "buf" again. Use "buf" instead of "&buf".

Link: http://lkml.kernel.org/r/20160418152329.18b72bea@debianSigned-off-by: NWang Xiaoqiang <wangxq10@lzu.edu.cn>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4afe6495

26 4月, 2016 4 次提交

tracing: Handle tracing_map_alloc_elts() error path correctly · 6e4cf657

由 Tom Zanussi 提交于 4月 25, 2016

If tracing_map_elt_alloc() fails, it will return ERR_PTR() instead of
NULL, so change the check to IS_ERROR(). We also need to set the
failed entry in the map->elts array to NULL instead of ERR_PTR() so
tracing_map_free_elts() doesn't try freeing an ERR_PTR().

tracing_map_free_elts() should also zero out what it frees so a
reentrant call won't find previously freed elements.

Link: http://lkml.kernel.org/r/f29d03b00bce3aac8cf151a8a30e6c83e5fee66d.1461610073.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6e4cf657

tracing: Add check for NULL event field when creating hist field · 432480c5

由 Tom Zanussi 提交于 4月 25, 2016

Smatch flagged create_hist_field() as possibly being able to
dereference a NULL pointer, although the current code exits in all
cases where the event field could be NULL, so it's not actually a
problem.

Still, to prevent future changes to the code from overlooking new
cases, make the NULL pointer check explicit and warn once in that
case.

Link: http://lkml.kernel.org/r/cfbc003f534a3e441b4313272fd412310aba6336.1461610073.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

432480c5

tracing: checking for NULL instead of IS_ERR() · 4812952f

由 Dan Carpenter 提交于 4月 23, 2016

tracing_map_elt_alloc() returns ERR_PTRs on error, never NULL.

Fixes: 08d43a5f ('tracing: Add lock-free tracing_map')
Link: http://lkml.kernel.org/r/20160423102347.GA11136@mwandaAcked-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4812952f

tracing: Do not inherit event-fork option for instances · 20550622

由 Steven Rostedt (Red Hat) 提交于 4月 25, 2016

As the event-fork option requires doing work when enabled and disabled, it
can not be passed down to created instances. The instance must clear this
flag when it is created, and must clear it when its removed.

As more options may be created with this need, a macro ZEROED_TRACE_FLAGS is
created that holds the flags that must not be inherited by the top level
instance, and must be cleared on removal of instances.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

20550622

20 4月, 2016 10 次提交

bpf: add event output helper for notifications/sampling/logging · bd570ff9

由 Daniel Borkmann 提交于 4月 18, 2016

This patch adds a new helper for cls/act programs that can push events
to user space applications. For networking, this can be f.e. for sampling,
debugging, logging purposes or pushing of arbitrary wake-up events. The
idea is similar to a43eec30 ("bpf: introduce bpf_perf_event_output()
helper") and 39111695 ("samples: bpf: add bpf_perf_event_output example").

The eBPF program utilizes a perf event array map that user space populates
with fds from perf_event_open(), the eBPF program calls into the helper
f.e. as skb_event_output(skb, &my_map, BPF_F_CURRENT_CPU, raw, sizeof(raw))
so that the raw data is pushed into the fd f.e. at the map index of the
current CPU.

User space can poll/mmap/etc on this and has a data channel for receiving
events that can be post-processed. The nice thing is that since the eBPF
program and user space application making use of it are tightly coupled,
they can define their own arbitrary raw data format and what/when they
want to push.

While f.e. packet headers could be one part of the meta data that is being
pushed, this is not a substitute for things like packet sockets as whole
packet is not being pushed and push is only done in a single direction.
Intention is more of a generically usable, efficient event pipe to applications.
Workflow is that tc can pin the map and applications can attach themselves
e.g. after cls/act setup to one or multiple map slots, demuxing is done by
the eBPF program.

Adding this facility is with minimal effort, it reuses the helper
introduced in a43eec30 ("bpf: introduce bpf_perf_event_output() helper")
and we get its functionality for free by overloading its BPF_FUNC_ identifier
for cls/act programs, ctx is currently unused, but will be made use of in
future. Example will be added to iproute2's BPF example files.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd570ff9

bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_output · 1e33759c

由 Daniel Borkmann 提交于 4月 18, 2016

Add a BPF_F_CURRENT_CPU flag to optimize the use-case where user space has
per-CPU ring buffers and the eBPF program pushes the data into the current
CPU's ring buffer which saves us an extra helper function call in eBPF.
Also, make sure to properly reserve the remaining flags which are not used.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e33759c

tracing: Fix unsigned comparison to zero in hist trigger code · d50c744e

由 Steven Rostedt (Red Hat) 提交于 3月 08, 2016

Fengguang Wu's bot found two comparisons of unsigned integers to zero. These
were real bugs, as it would miss error conditions returned to zero.

trace_events_hist.c:426:6-9: WARNING: Unsigned expression compared with zero: idx < 0
trace_events_hist.c:568:5-14: WARNING: Unsigned expression compared with zero: n_entries < 0
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d50c744e

tracing: Add hist trigger 'log2' modifier · 4b94f5b7

由 Namhyung Kim 提交于 3月 03, 2016

Allow users to have numeric fields displayed as log2 values in case
value range is very wide by appending '.log2' to field names.

For example,

  # echo 'hist:key=bytes_req' > kmalloc/trigger
  # cat kmalloc/hist

  { bytes_req:        504 } hitcount:          1
  { bytes_req:         11 } hitcount:          1
  { bytes_req:        104 } hitcount:          1
  { bytes_req:         48 } hitcount:          1
  { bytes_req:       2048 } hitcount:          1
  { bytes_req:       4096 } hitcount:          1
  { bytes_req:        240 } hitcount:          1
  { bytes_req:        392 } hitcount:          1
  { bytes_req:         13 } hitcount:          1
  { bytes_req:         28 } hitcount:          1
  { bytes_req:         12 } hitcount:          1
  { bytes_req:         64 } hitcount:          2
  { bytes_req:        128 } hitcount:          2
  { bytes_req:         32 } hitcount:          2
  { bytes_req:          8 } hitcount:         11
  { bytes_req:         10 } hitcount:         13
  { bytes_req:         24 } hitcount:         25
  { bytes_req:        160 } hitcount:         29
  { bytes_req:         16 } hitcount:         33
  { bytes_req:         80 } hitcount:         36

When using '.log2' modifier, the output looks like:

  # echo 'hist:key=bytes_req.log2' > kmalloc/trigger
  # cat kmalloc/hist

  { bytes_req: ~ 2^12 } hitcount:          1
  { bytes_req: ~ 2^11 } hitcount:          1
  { bytes_req: ~ 2^9  } hitcount:          2
  { bytes_req: ~ 2^6  } hitcount:          3
  { bytes_req: ~ 2^3  } hitcount:         13
  { bytes_req: ~ 2^5  } hitcount:         19
  { bytes_req: ~ 2^8  } hitcount:         49
  { bytes_req: ~ 2^7  } hitcount:         57
  { bytes_req: ~ 2^4  } hitcount:         74

Link: http://lkml.kernel.org/r/7ff396b246c6a881f46b979735fddf05a0d6c71a.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4b94f5b7

tracing: Add support for named hist triggers · 5463bfda

由 Tom Zanussi 提交于 3月 03, 2016

Allow users to define 'named' hist triggers.  All triggers created
with the same 'name=xxx' option will update the same shared histogram
data.

This expands the hist trigger syntax from this:

    # echo hist:keys=xxx ... [ if filter] > event/trigger

to this:

    # echo hist:name=xxx:keys=xxx ... [ if filter] > event/trigger

Named histograms must use a 'compatible' set of keys and values, which
means each event added to a set of named triggers must have the same
names and types.

Reading the 'hist' file of any of the participating events will
produce the same output as any other participating event, which is to
be expected since they share the same data.

Link: http://lkml.kernel.org/r/1dbc84ee3322a75daaf5b3ef1d0cc0a2fb682fc7.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5463bfda

tracing: Add support for named triggers · db1388b4

由 Tom Zanussi 提交于 3月 03, 2016

Named triggers are sets of triggers that share a common set of trigger
data. An example of functionality that could benefit from this type
of capability would be a set of inlined probes that would each
contribute event counts, for example, to a shared counter data
structure.

The first named trigger registered with a given name owns the common
trigger data that the others subsequently registered with the same
name will reference. The functions defined here allow users to add,
delete, and find named triggers.

It also adds functions to pause and unpause named triggers; since
named triggers act upon common data, they should also be paused and
unpaused as a group.

Link: http://lkml.kernel.org/r/c09ff648360f65b10a3e321eddafe18060b4a04f.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

db1388b4

tracing: Add support for multiple hist triggers per event · 52a7f16d

由 Tom Zanussi 提交于 3月 03, 2016

Allow users to define any number of hist triggers per trace event.
Any number of hist triggers may be added for a given event, which may
differ by key, value, or filter.

Reading the event's 'hist' file will display the output of all the
hist triggers defined on an event concatenated in the order they were
defined.

Link: http://lkml.kernel.org/r/48a0c8dd34c344571de880fb35e211c6d9a28961.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

52a7f16d

tracing: Add enable_hist/disable_hist triggers · d0bad49b

由 Tom Zanussi 提交于 3月 03, 2016

Similar to enable_event/disable_event triggers, these triggers enable
and disable the aggregation of events into maps rather than enabling
and disabling their writing into the trace buffer.

They can be used to automatically start and stop hist triggers based
on a matching filter condition.

If there's a paused hist trigger on system:event, the following would
start it when the filter condition was hit:

  # echo enable_hist:system:event [ if filter] > event/trigger

And the following would disable a running system:event hist trigger:

  # echo disable_hist:system:event [ if filter] > event/trigger

See Documentation/trace/events.txt for real examples.

Link: http://lkml.kernel.org/r/f812f086e52c8b7c8ad5443487375e03c96a601f.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d0bad49b

tracing: Remove restriction on string position in hist trigger keys · 6a475cb1

由 Tom Zanussi 提交于 3月 03, 2016

If we assume the maximum size for a string field, we don't have to
worry about its position. Since we only allow two keys in a compound
key and having more than one string key in a given compound key
doesn't make much sense anyway, trading a bit of extra space instead
of introducing an arbitrary restriction makes more sense.

We also need to use the event field size for static strings when
copying the contents, otherwise we get random garbage in the key.

Also, cast string return values to avoid warnings on 32-bit compiles.

Finally, rearrange the code without changing any functionality by
moving the compound key updating code into a separate function.

Link: http://lkml.kernel.org/r/8976e1ab04b66bc2700ad1ed0768a2de85ac1983.1457029949.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reviewed-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

6a475cb1

tracing: Support string type key properly · 79e577cb

由 Namhyung Kim 提交于 3月 03, 2016

The string in a trace event is usually recorded as dynamic array which
is variable length.  But current hist code only support fixed length
array so it cannot support most strings.

This patch fixes it by checking filter_type of the field and get
proper pointer with it.  With this, it can get a histogram of exec()
based on filenames like below:

  # cd /sys/kernel/tracing/events/sched/sched_process_exec
  # cat 'hist:key=filename' > trigger
  # ps
   PID TTY       TIME CMD
     1 ?     00:00:00 init
    29 ?     00:00:00 sh
    38 ?     00:00:00 ps
  # ls
  enable  filter  format  hist  id  trigger
  # cat hist
  # trigger info: hist:keys=filename:vals=hitcount:sort=hitcount:size=2048 [active]

  { filename: /usr/bin/ps                         } hitcount:          1
  { filename: /usr/bin/ls                         } hitcount:          1
  { filename: /usr/bin/cat                        } hitcount:          1

  Totals:
      Hits: 3
      Entries: 3
      Dropped: 0

Link: http://lkml.kernel.org/r/610180d6df0cfdf11ee205452f3b241dea657233.1457029949.git.tom.zanussi@linux.intel.com

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ Added (unsigned long) typecast to fix compile warning ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

79e577cb

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功