提交 · 3a38bb98d9abdc3856f26b5ed4332803065cd7cf · OpenHarmony / kernel_linux

11 4月, 2018 1 次提交

bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog · 3a38bb98

由 Yonghong Song 提交于 4月 10, 2018

syzbot reported a possible deadlock in perf_event_detach_bpf_prog.
The error details:
  ======================================================
  WARNING: possible circular locking dependency detected
  4.16.0-rc7+ #3 Not tainted
  ------------------------------------------------------
  syz-executor7/24531 is trying to acquire lock:
   (bpf_event_mutex){+.+.}, at: [<000000008a849b07>] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854

  but task is already holding lock:
   (&mm->mmap_sem){++++}, at: [<0000000038768f87>] vm_mmap_pgoff+0x198/0x280 mm/util.c:353

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #1 (&mm->mmap_sem){++++}:
       __might_fault+0x13a/0x1d0 mm/memory.c:4571
       _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
       copy_to_user include/linux/uaccess.h:155 [inline]
       bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
       perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
       _perf_ioctl kernel/events/core.c:4750 [inline]
       perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
       vfs_ioctl fs/ioctl.c:46 [inline]
       do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
       SYSC_ioctl fs/ioctl.c:701 [inline]
       SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

  -> #0 (bpf_event_mutex){+.+.}:
       lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
       __mutex_lock_common kernel/locking/mutex.c:756 [inline]
       __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
       perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
       perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
       _free_event+0xbdb/0x10f0 kernel/events/core.c:4116
       put_event+0x24/0x30 kernel/events/core.c:4204
       perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
       remove_vma+0xb4/0x1b0 mm/mmap.c:172
       remove_vma_list mm/mmap.c:2490 [inline]
       do_munmap+0x82a/0xdf0 mm/mmap.c:2731
       mmap_region+0x59e/0x15a0 mm/mmap.c:1646
       do_mmap+0x6c0/0xe00 mm/mmap.c:1483
       do_mmap_pgoff include/linux/mm.h:2223 [inline]
       vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
       SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
       SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
       SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
       SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
       do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
       entry_SYSCALL_64_after_hwframe+0x42/0xb7

  other info that might help us debug this:

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&mm->mmap_sem);
                                 lock(bpf_event_mutex);
                                 lock(&mm->mmap_sem);
    lock(bpf_event_mutex);

   *** DEADLOCK ***
  ======================================================

The bug is introduced by Commit f371b304 ("bpf/tracing: allow
user space to query prog array on the same tp") where copy_to_user,
which requires mm->mmap_sem, is called inside bpf_event_mutex lock.
At the same time, during perf_event file descriptor close,
mm->mmap_sem is held first and then subsequent
perf_event_detach_bpf_prog needs bpf_event_mutex lock.
Such a senario caused a deadlock.

As suggested by Daniel, moving copy_to_user out of the
bpf_event_mutex lock should fix the problem.

Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

3a38bb98

31 3月, 2018 1 次提交

bpf: Check attach type at prog load time · 5e43f899

由 Andrey Ignatov 提交于 3月 30, 2018

== The problem ==

There are use-cases when a program of some type can be attached to
multiple attach points and those attach points must have different
permissions to access context or to call helpers.

E.g. context structure may have fields for both IPv4 and IPv6 but it
doesn't make sense to read from / write to IPv6 field when attach point
is somewhere in IPv4 stack.

Same applies to BPF-helpers: it may make sense to call some helper from
some attach point, but not from other for same prog type.

== The solution ==

Introduce `expected_attach_type` field in in `struct bpf_attr` for
`BPF_PROG_LOAD` command. If scenario described in "The problem" section
is the case for some prog type, the field will be checked twice:

1) At load time prog type is checked to see if attach type for it must
   be known to validate program permissions correctly. Prog will be
   rejected with EINVAL if it's the case and `expected_attach_type` is
   not specified or has invalid value.

2) At attach time `attach_type` is compared with `expected_attach_type`,
   if prog type requires to have one, and, if they differ, attach will
   be rejected with EINVAL.

The `expected_attach_type` is now available as part of `struct bpf_prog`
in both `bpf_verifier_ops->is_valid_access()` and
`bpf_verifier_ops->get_func_proto()` () and can be used to check context
accesses and calls to helpers correspondingly.

Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and
Daniel Borkmann <daniel@iogearbox.net> here:
https://marc.info/?l=linux-netdev&m=152107378717201&w=2Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5e43f899

29 3月, 2018 1 次提交

bpf: introduce BPF_RAW_TRACEPOINT · c4f6699d

由 Alexei Starovoitov 提交于 3月 28, 2018

Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c4f6699d

27 3月, 2018 1 次提交

tracing: Block comments should align the * on each line · 13cf912b

由 Rohit Visavalia 提交于 1月 29, 2018

Resolved Block comments use * on subsequent lines checkpatch warning.
Issue found by checkpatch.
Signed-off-by: NRohit Visavalia <rohit.visavalia@softnautics.com>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

13cf912b

26 3月, 2018 1 次提交

treewide: Align function definition open/close braces · 447a5647

由 Joe Perches 提交于 3月 21, 2018

Some functions definitions have either the initial open brace and/or
the closing brace outside of column 1.

Move those braces to column 1.

This allows various function analyzers like gnu complexity to work
properly for these modified functions.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Acked-by: NAlex Deucher <alexander.deucher@amd.com>
Acked-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NNicolin Chen <nicoleotsuka@gmail.com>
Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NJiri Kosina <jkosina@suse.cz>

447a5647

24 3月, 2018 1 次提交

tracing: probeevent: Fix to support minus offset from symbol · c5d343b6

由 Masami Hiramatsu 提交于 3月 17, 2018

In Documentation/trace/kprobetrace.txt, it says

 @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)

However, the parser doesn't parse minus offset correctly, since
commit 2fba0c88 ("tracing/kprobes: Fix probe offset to be
unsigned") drops minus ("-") offset support for kprobe probe
address usage.

This fixes the traceprobe_split_symbol_offset() to parse minus
offset again with checking the offset range, and add a minus
offset check in kprobe probe address usage.

Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 2fba0c88 ("tracing/kprobes: Fix probe offset to be unsigned")
Acked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

c5d343b6

21 3月, 2018 1 次提交

trace/bpf: remove helper bpf_perf_prog_read_value from tracepoint type programs · f005afed

由 Yonghong Song 提交于 3月 20, 2018

Commit 4bebdc7a ("bpf: add helper bpf_perf_prog_read_value")
added helper bpf_perf_prog_read_value so that perf_event type program
can read event counter and enabled/running time.
This commit, however, introduced a bug which allows this helper
for tracepoint type programs. This is incorrect as bpf_perf_prog_read_value
needs to access perf_event through its bpf_perf_event_data_kern type context,
which is not available for tracepoint type program.

This patch fixed the issue by separating bpf_func_proto between tracepoint
and perf_event type programs and removed bpf_perf_prog_read_value
from tracepoint func prototype.

Fixes: 4bebdc7a ("bpf: add helper bpf_perf_prog_read_value")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f005afed

13 3月, 2018 1 次提交

tracing: Unify the "boot" and "mono" tracing clocks · 92af4dcb

由 Thomas Gleixner 提交于 3月 01, 2018

Unify the "boot" and "mono" tracing clocks and document the new behaviour.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kevin Easton <kevin@guarana.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salyzyn <salyzyn@android.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20180301165150.489635255@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

92af4dcb

08 3月, 2018 1 次提交

bpf: add support to read sample address in bpf program · 95da0cdb

由 Teng Qin 提交于 3月 06, 2018

This commit adds new field "addr" to bpf_perf_event_data which could be
read and used by bpf programs attached to perf events. The value of the
field is copied from bpf_perf_event_data_kern.addr and contains the
address value recorded by specifying sample_type with PERF_SAMPLE_ADDR
when calling perf_event_open.
Signed-off-by: NTeng Qin <qinteng@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

95da0cdb

15 2月, 2018 1 次提交

bpf: fix bpf_prog_array_copy_to_user warning from perf event prog query · 9c481b90

由 Daniel Borkmann 提交于 2月 14, 2018

syzkaller tried to perform a prog query in perf_event_query_prog_array()
where struct perf_event_query_bpf had an ids_len of 1,073,741,353 and
thus causing a warning due to failed kcalloc() allocation out of the
bpf_prog_array_copy_to_user() helper. Given we cannot attach more than
64 programs to a perf event, there's no point in allowing huge ids_len.
Therefore, allow a buffer that would fix the maximum number of ids and
also add a __GFP_NOWARN to the temporary ids buffer.

Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
Fixes: 0911287c ("bpf: fix bpf_prog_array_copy_to_user() issues")
Reported-by: syzbot+cab5816b0edbabf598b3@syzkaller.appspotmail.com
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9c481b90

12 2月, 2018 1 次提交

vfs: do bulk POLL* -> EPOLL* replacement · a9a08845

由 Linus Torvalds 提交于 2月 11, 2018

This is the mindless scripted replacement of kernel use of POLL*
variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
        L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
        for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
    done

with de-mangling cleanups yet to come.

NOTE! On almost all architectures, the EPOLL* constants have the same
values as the POLL* constants do.  But they keyword here is "almost".
For various bad reasons they aren't the same, and epoll() doesn't
actually work quite correctly in some cases due to this on Sparc et al.

The next patch from Al will sort out the final differences, and we
should be all done.
Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a9a08845

08 2月, 2018 2 次提交

tracing: Fix parsing of globs with a wildcard at the beginning · 07234021

由 Steven Rostedt (VMware) 提交于 2月 05, 2018

Al Viro reported:

    For substring - sure, but what about something like "*a*b" and "a*b"?
    AFAICS, filter_parse_regex() ends up with identical results in both
    cases - MATCH_GLOB and *search = "a*b".  And no way for the caller
    to tell one from another.

Testing this with the following:

 # cd /sys/kernel/tracing
 # echo '*raw*lock' > set_ftrace_filter
 bash: echo: write error: Invalid argument

With this patch:

 # echo '*raw*lock' > set_ftrace_filter
 # cat set_ftrace_filter
_raw_read_trylock
_raw_write_trylock
_raw_read_unlock
_raw_spin_unlock
_raw_write_unlock
_raw_spin_trylock
_raw_spin_lock
_raw_write_lock
_raw_read_lock

Al recommended not setting the search buffer to skip the first '*' unless we
know we are not using MATCH_GLOB. This implements his suggested logic.

Link: http://lkml.kernel.org/r/20180127170748.GF13338@ZenIV.linux.org.uk

Cc: stable@vger.kernel.org
Fixes: 60f1d5e3 ("ftrace: Support full glob matching")
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Suggsted-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

07234021

ftrace: Remove incorrect setting of glob search field · 7b658656

由 Steven Rostedt (VMware) 提交于 2月 05, 2018

__unregister_ftrace_function_probe() will incorrectly parse the glob filter
because it resets the search variable that was setup by filter_parse_regex().

Al Viro reported this:

    After that call of filter_parse_regex() we could have func_g.search not
    equal to glob only if glob started with '!' or '*'.  In the former case
    we would've buggered off with -EINVAL (not = 1).  In the latter we
    would've set func_g.search equal to glob + 1, calculated the length of
    that thing in func_g.len and proceeded to reset func_g.search back to
    glob.

    Suppose the glob is e.g. *foo*.  We end up with
	    func_g.type = MATCH_MIDDLE_ONLY;
	    func_g.len = 3;
	    func_g.search = "*foo";
    Feeding that to ftrace_match_record() will not do anything sane - we
    will be looking for names containing "*foo" (->len is ignored for that
    one).

Link: http://lkml.kernel.org/r/20180127031706.GE13338@ZenIV.linux.org.uk

Cc: stable@vger.kernel.org
Fixes: 3ba00929 ("ftrace: Introduce ftrace_glob structure")
Reviewed-by: NDmitry Safonov <0x7f454c46@gmail.com>
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

7b658656

06 2月, 2018 2 次提交

perf/core: Implement the 'perf_uprobe' PMU · 33ea4b24

由 Song Liu 提交于 12月 06, 2017

This patch adds perf_uprobe support with similar pattern as previous
patch (for kprobe).

Two functions, create_local_trace_uprobe() and
destroy_local_trace_uprobe(), are created so a uprobe can be created
and attached to the file descriptor created by perf_event_open().
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NYonghong Song <yhs@fb.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Cc: <daniel@iogearbox.net>
Cc: <davem@davemloft.net>
Cc: <kernel-team@fb.com>
Cc: <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171206224518.3598254-7-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

33ea4b24

perf/core: Implement the 'perf_kprobe' PMU · e12f03d7

由 Song Liu 提交于 12月 06, 2017

A new PMU type, perf_kprobe is added. Based on attr from perf_event_open(),
perf_kprobe creates a kprobe (or kretprobe) for the perf_event. This
kprobe is private to this perf_event, and thus not added to global
lists, and not available in tracefs.

Two functions, create_local_trace_kprobe() and
destroy_local_trace_kprobe()  are added to created and destroy these
local trace_kprobe.
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NYonghong Song <yhs@fb.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Cc: <daniel@iogearbox.net>
Cc: <davem@davemloft.net>
Cc: <kernel-team@fb.com>
Cc: <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171206224518.3598254-6-songliubraving@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

e12f03d7

24 1月, 2018 7 次提交

ftrace: Mark function tracer test functions noinline/noclone · dd3dad0d

由 Andi Kleen 提交于 12月 21, 2017

The ftrace function tracer self tests calls some functions to verify
the get traced. This relies on them not being inlined. Previously
this was ensured by putting them into another file, but with LTO
the compiler can inline across files, which makes the tests fail.

Mark these functions as noinline and noclone.

Link: http://lkml.kernel.org/r/20171221233732.31896-1-andi@firstfloor.orgSigned-off-by: NAndi Kleen <ak@linux.intel.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

dd3dad0d

trace_uprobe: Display correct offset in uprobe_events · 0e4d819d

由 Ravi Bangoria 提交于 1月 06, 2018

Recently, how the pointers being printed with %p has been changed
by commit ad67b74d ("printk: hash addresses printed with %p").
This is causing a regression while showing offset in the
uprobe_events file. Instead of %p, use %px to display offset.

Before patch:

  # perf probe -vv -x /tmp/a.out main
  Opening /sys/kernel/debug/tracing//uprobe_events write=1
  Writing event: p:probe_a/main /tmp/a.out:0x58c

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x0000000049a0f352

After patch:

  # cat /sys/kernel/debug/tracing/uprobe_events
  p:probe_a/main /tmp/a.out:0x000000000000058c

Link: http://lkml.kernel.org/r/20180106054246.15375-1-ravi.bangoria@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: ad67b74d ("printk: hash addresses printed with %p")
Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

0e4d819d

tracing: Make sure the parsed string always terminates with '\0' · f4d0706c

由 Changbin Du 提交于 1月 16, 2018

Always mark the parsed string with a terminated nul '\0' character. This removes
the need for the users to have to append the '\0' before using the parsed string.

Link: http://lkml.kernel.org/r/1516093350-12045-4-git-send-email-changbin.du@intel.comAcked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NChangbin Du <changbin.du@intel.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

f4d0706c

tracing: Clear parser->idx if only spaces are read · 76638d96

由 Changbin Du 提交于 1月 16, 2018

If only spaces were read while parsing the next string, then parser->idx should be
cleared in order to make trace_parser_loaded() return false.

Link: http://lkml.kernel.org/r/1516093350-12045-3-git-send-email-changbin.du@intel.comAcked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NChangbin Du <changbin.du@intel.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

76638d96

tracing: Detect the string nul character when parsing user input string · 921a7acd

由 Changbin Du 提交于 1月 16, 2018

User space can pass in a C nul character '\0' along with its input. The
function trace_get_user() will try to process it as a normal character,
and that will fail to parse.

open("/sys/kernel/debug/tracing//set_ftrace_pid", O_WRONLY|O_TRUNC) = 3
write(3, " \0", 2)                      = -1 EINVAL (Invalid argument)

while parse can handle spaces, so below works.

$ echo "" > set_ftrace_pid
$ echo " " > set_ftrace_pid
$ echo -n " " > set_ftrace_pid

Have the parser stop on '\0' and cease any further parsing. Only process
the characters up to the nul '\0' character and do not process it.

Link: http://lkml.kernel.org/r/1516093350-12045-2-git-send-email-changbin.du@intel.comAcked-by: NNamhyung Kim <namhyung@kernel.org>
Signed-off-by: NChangbin Du <changbin.du@intel.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

921a7acd

tracing: Update stack trace skipping for ORC unwinder · 2ee5b92a

由 Steven Rostedt (VMware) 提交于 1月 23, 2018

With the addition of ORC unwinder and FRAME POINTER unwinder, the stack
trace skipping requirements have changed.

I went through the tracing stack trace dumps with ORC and with frame
pointers and recalculated the proper values.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

2ee5b92a

ftrace, orc, x86: Handle ftrace dynamically allocated trampolines · 6be7fa3c

由 Steven Rostedt (VMware) 提交于 1月 22, 2018

The function tracer can create a dynamically allocated trampoline that is
called by the function mcount or fentry hook that is used to call the
function callback that is registered. The problem is that the orc undwinder
will bail if it encounters one of these trampolines. This breaks the stack
trace of function callbacks, which include the stack tracer and setting the
stack trace for individual functions.

Since these dynamic trampolines are basically copies of the static ftrace
trampolines defined in ftrace_*.S, we do not need to create new orc entries
for the dynamic trampolines. Finding the return address on the stack will be
identical as the functions that were copied to create the dynamic
trampolines. When encountering a ftrace dynamic trampoline, we can just use
the orc entry of the ftrace static function that was copied for that
trampoline.
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

6be7fa3c

19 1月, 2018 2 次提交

tracing: Fix converting enum's from the map in trace_event_eval_update() · 1ebe1eaf

由 Steven Rostedt (VMware) 提交于 1月 18, 2018

Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.

Some enums were not being converted. This was caused by an optization that
had a bug in it.

All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.

To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.

The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.

Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.

Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com

Cc: stable@vger.kernel.org
Fixes: 0c564a53 ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Teste-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

1ebe1eaf

ring-buffer: Fix duplicate results in mapping context to bits in recursive lock · 0164e0d7

由 Steven Rostedt (VMware) 提交于 1月 18, 2018

In bringing back the context checks, the code checks first if its normal
(non-interrupt) context, and then for NMI then IRQ then softirq. The final
check is redundant. Since the if branch is only hit if the context is one of
NMI, IRQ, or SOFTIRQ, if it's not NMI or IRQ there's no reason to check if
it is SOFTIRQ. The current code returns the same result even if its not a
SOFTIRQ. Which is confusing.

  pc & SOFTIRQ_OFFSET ? 2 : RB_CTX_SOFTIRQ

Is redundant as RB_CTX_SOFTIRQ *is* 2!

Fixes: a0e3a18f ("ring-buffer: Bring back context level recursive checks")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

0164e0d7

18 1月, 2018 1 次提交

bpf: change fake_ip for bpf_trace_printk helper · eefa864a

由 Yonghong Song 提交于 1月 17, 2018

Currently, for bpf_trace_printk helper, fake ip address 0x1
is used with comments saying that fake ip will not be printed.
This is indeed true for 4.12 and earlier version, but for
4.13 and later version, the ip address will be printed if
it cannot be resolved with kallsym. Running samples/bpf/tracex5
program and you will have the following in the debugfs
trace_pipe output:
  ...
  <...>-1819  [003] ....   443.497877: 0x00000001: mmap
  <...>-1819  [003] ....   443.498289: 0x00000001: syscall=102 (one of get/set uid/pid/gid)
  ...

The kernel commit changed this behavior is:
  commit feaf1283
  Author: Steven Rostedt (VMware) <rostedt@goodmis.org>
  Date:   Thu Jun 22 17:04:55 2017 -0400

      tracing: Show address when function names are not found
  ...

This patch changed the comment and also altered the fake ip
address to 0x0 as users may think 0x1 has some special meaning
while it doesn't. The new output:
  ...
  <...>-1799  [002] ....    25.953576: 0: mmap
  <...>-1799  [002] ....    25.953865: 0: read(fd=0, buf=00000000053936b5, size=512)
  ...
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

eefa864a

16 1月, 2018 2 次提交

tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y · 68e76e03

由 Randy Dunlap 提交于 1月 15, 2018

I regularly get 50 MB - 60 MB files during kernel randconfig builds.
These large files mostly contain (many repeats of; e.g., 124,594):

In file included from ../include/linux/string.h:6:0,
                 from ../include/linux/uuid.h:20,
                 from ../include/linux/mod_devicetable.h:13,
                 from ../scripts/mod/devicetable-offsets.c:3:
../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
    ______f = {     \
    ^
../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                       ^
../include/linux/string.h:425:2: note: in expansion of macro 'if'
  if (p_size == (size_t)-1 && q_size == (size_t)-1)
  ^

This only happens when CONFIG_FORTIFY_SOURCE=y and
CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
FORTIFY_SOURCE=y.

Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

68e76e03

ring-buffer: Bring back context level recursive checks · a0e3a18f

由 Steven Rostedt (VMware) 提交于 1月 15, 2018

Commit 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
simpler") replaced the context level recursion checks with a simple counter.
This would prevent the ring buffer code from recursively calling itself more
than the max number of contexts that exist (Normal, softirq, irq, nmi). But
this change caused a lockup in a specific case, which was during suspend and
resume using a global clock. Adding a stack dump to see where this occurred,
the issue was in the trace global clock itself:

  trace_buffer_lock_reserve+0x1c/0x50
  __trace_graph_entry+0x2d/0x90
  trace_graph_entry+0xe8/0x200
  prepare_ftrace_return+0x69/0xc0
  ftrace_graph_caller+0x78/0xa8
  queued_spin_lock_slowpath+0x5/0x1d0
  trace_clock_global+0xb0/0xc0
  ring_buffer_lock_reserve+0xf9/0x390

The function graph tracer traced queued_spin_lock_slowpath that was called
by trace_clock_global. This pointed out that the trace_clock_global() is not
reentrant, as it takes a spin lock. It depended on the ring buffer recursive
lock from letting that happen.

By removing the context detection and adding just a max number of allowable
recursions, it allowed the trace_clock_global() to be entered again and try
to retake the spinlock it already held, causing a deadlock.

Fixes: 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
Reported-by: NDavid Weinehall <david.weinehall@gmail.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

a0e3a18f

13 1月, 2018 3 次提交

error-injection: Separate error-injection from kprobe · 540adea3

由 Masami Hiramatsu 提交于 1月 13, 2018

Since error-injection framework is not limited to be used
by kprobes, nor bpf. Other kernel subsystems can use it
freely for checking safeness of error-injection, e.g.
livepatch, ftrace etc.
So this separate error-injection framework from kprobes.

Some differences has been made:

- "kprobe" word is removed from any APIs/structures.
- BPF_ALLOW_ERROR_INJECTION() is renamed to
  ALLOW_ERROR_INJECTION() since it is not limited for BPF too.
- CONFIG_FUNCTION_ERROR_INJECTION is the config item of this
  feature. It is automatically enabled if the arch supports
  error injection feature for kprobe or ftrace etc.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

540adea3

tracing/kprobe: bpf: Compare instruction pointer with original one · 66665ad2

由 Masami Hiramatsu 提交于 1月 13, 2018

Compare instruction pointer with original one on the
stack instead using per-cpu bpf_kprobe_override flag.

This patch also consolidates reset_current_kprobe() and
preempt_enable_no_resched() blocks. Those can be done
in one place.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

66665ad2

tracing/kprobe: bpf: Check error injectable event is on function entry · b4da3340

由 Masami Hiramatsu 提交于 1月 13, 2018

Check whether error injectable event is on function entry or not.
Currently it checks the event is ftrace-based kprobes or not,
but that is wrong. It should check if the event is on the entry
of target function. Since error injection will override a function
to just return with modified return value, that operation must
be done before the target function starts making stackframe.

As a side effect, bpf error injection is no need to depend on
function-tracer. It can work with sw-breakpoint based kprobe
events too.
Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b4da3340

28 12月, 2017 5 次提交

tracing: Fix possible double free on failure of allocating trace buffer · 4397f045

由 Steven Rostedt (VMware) 提交于 12月 26, 2017

Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Cc: stable@vger.kernel.org
Fixes: 737223fb ("tracing: Consolidate buffer allocation code")
Reported-by: NJing Xia <jing.xia@spreadtrum.com>
Reported-by: NChunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

4397f045

tracing: Fix crash when it fails to alloc ring buffer · 24f2aaf9

由 Jing Xia 提交于 12月 26, 2017

Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:

instance_mkdir()
    |-allocate_trace_buffers()
        |-allocate_trace_buffer(tr, &tr->trace_buffer...)
	|-allocate_trace_buffer(tr, &tr->max_buffer...)

          // allocate fail(-ENOMEM),first free
          // and the buffer pointer is not set to null
        |-ring_buffer_free(tr->trace_buffer.buffer)

       // out_free_tr
    |-free_trace_buffers()
        |-free_trace_buffer(&tr->trace_buffer);

	      //if trace_buffer is not null, free again
	    |-ring_buffer_free(buf->buffer)
                |-rb_free_cpu_buffer(buffer->buffers[cpu])
                    // ring_buffer_per_cpu is null, and
                    // crash in ring_buffer_per_cpu->pages

Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com

Cc: stable@vger.kernel.org
Fixes: 737223fb ("tracing: Consolidate buffer allocation code")
Signed-off-by: NJing Xia <jing.xia@spreadtrum.com>
Signed-off-by: NChunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

24f2aaf9

ring-buffer: Do no reuse reader page if still in use · ae415fa4

由 Steven Rostedt (VMware) 提交于 12月 22, 2017

To free the reader page that is allocated with ring_buffer_alloc_read_page(),
ring_buffer_free_read_page() must be called. For faster performance, this
page can be reused by the ring buffer to avoid having to free and allocate
new pages.

The issue arises when the page is used with a splice pipe into the
networking code. The networking code may up the page counter for the page,
and keep it active while sending it is queued to go to the network. The
incrementing of the page ref does not prevent it from being reused in the
ring buffer, and this can cause the page that is being sent out to the
network to be modified before it is sent by reading new data.

Add a check to the page ref counter, and only reuse the page if it is not
being used anywhere else.

Cc: stable@vger.kernel.org
Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

ae415fa4

tracing: Remove extra zeroing out of the ring buffer page · 6b7e633f

由 Steven Rostedt (VMware) 提交于 12月 22, 2017

The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.

Cc: stable@vger.kernel.org
Fixes: 2711ca23 ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

6b7e633f

ring-buffer: Mask out the info bits when returning buffer page length · 45d8b80c

由 Steven Rostedt (VMware) 提交于 12月 22, 2017

Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.

What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.

Cc: stable@vger.kernel.org
Fixes: 66a8cb95 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

45d8b80c

18 12月, 2017 1 次提交

trace: reenable preemption if we modify the ip · 46df3d20

由 Josef Bacik 提交于 12月 15, 2017

Things got moved around between the original bpf_override_return patches
and the final version, and now the ftrace kprobe dispatcher assumes if
you modified the ip that you also enabled preemption.  Make a comment of
this and enable preemption, this fixes the lockdep splat that happened
when using this feature.

Fixes: 9802d865 ("bpf: add a bpf_override_function helper")
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

46df3d20

15 12月, 2017 1 次提交

tracing: Have stack trace not record if RCU is not watching · b00d607b

由 Steven Rostedt (VMware) 提交于 12月 05, 2017

The stack tracer records a stack dump whenever it sees a stack usage that is
more than what it ever saw before. This can happen at any function that is
being traced. If it happens when the CPU is going idle (or other strange
locations), RCU may not be watching, and in this case, the recording of the
stack trace will trigger a warning. There's been lots of efforts to make
hacks to allow stack tracing to proceed even if RCU is not watching, but
this only causes more issues to appear. Simply do not trace a stack if RCU
is not watching. It probably isn't a bad stack anyway.
Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

b00d607b

14 12月, 2017 1 次提交

bpf/tracing: fix kernel/events/core.c compilation error · f4e2298e

由 Yonghong Song 提交于 12月 13, 2017

Commit f371b304 ("bpf/tracing: allow user space to
query prog array on the same tp") introduced a perf
ioctl command to query prog array attached to the
same perf tracepoint. The commit introduced a
compilation error under certain config conditions, e.g.,
  (1). CONFIG_BPF_SYSCALL is not defined, or
  (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS
       nor CONFIG_KPROBE_EVENTS is defined.

Error message:
  kernel/events/core.o: In function `perf_ioctl':
  core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array'

This patch fixed this error by guarding the real definition under
CONFIG_BPF_EVENTS and provided static inline dummy function
if CONFIG_BPF_EVENTS was not defined.
It renamed the function from bpf_event_query_prog_array to
perf_event_query_prog_array and moved the definition from linux/bpf.h
to linux/trace_events.h so the definition is in proximity to
other prog_array related functions.

Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

f4e2298e

13 12月, 2017 2 次提交

bpf: fix corruption on concurrent perf_event_output calls · 283ca526

由 Daniel Borkmann 提交于 12月 12, 2017

When tracing and networking programs are both attached in the
system and both use event-output helpers that eventually call
into perf_event_output(), then we could end up in a situation
where the tracing attached program runs in user context while
a cls_bpf program is triggered on that same CPU out of softirq
context.

Since both rely on the same per-cpu perf_sample_data, we could
potentially corrupt it. This can only ever happen in a combination
of the two types; all tracing programs use a bpf_prog_active
counter to bail out in case a program is already running on
that CPU out of a different context. XDP and cls_bpf programs
by themselves don't have this issue as they run in the same
context only. Therefore, split both perf_sample_data so they
cannot be accessed from each other.

Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data")
Reported-by: NAlexei Starovoitov <ast@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

283ca526

bpf: add a bpf_override_function helper · 9802d865

由 Josef Bacik 提交于 12月 11, 2017

Error injection is sloppy and very ad-hoc.  BPF could fill this niche
perfectly with it's kprobe functionality.  We could make sure errors are
only triggered in specific call chains that we care about with very
specific situations.  Accomplish this with the bpf_override_funciton
helper.  This will modify the probe'd callers return value to the
specified value and set the PC to an override function that simply
returns, bypassing the originally probed function.  This gives us a nice
clean way to implement systematic error injection for all of our code
paths.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9802d865

OpenHarmony / kernel_linux 上一次同步 4 年多

OpenHarmony / kernel_linux
上一次同步 4 年多