1. 30 4月, 2018 1 次提交
  2. 29 4月, 2018 1 次提交
    • Y
      bpf: add bpf_get_stack helper · c195651e
      Yonghong Song 提交于
      Currently, stackmap and bpf_get_stackid helper are provided
      for bpf program to get the stack trace. This approach has
      a limitation though. If two stack traces have the same hash,
      only one will get stored in the stackmap table,
      so some stack traces are missing from user perspective.
      
      This patch implements a new helper, bpf_get_stack, will
      send stack traces directly to bpf program. The bpf program
      is able to see all stack traces, and then can do in-kernel
      processing or send stack traces to user space through
      shared map or bpf_perf_event_output.
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c195651e
  3. 27 4月, 2018 4 次提交
    • T
      tracing: Add field modifier parsing hist error for hist triggers · dcf23457
      Tom Zanussi 提交于
      If the user specifies an invalid field modifier for a hist trigger,
      the current code correctly flags that as an error, but doesn't tell
      the user what happened.
      
      Fix this by invoking hist_err() with an appropriate message when
      invalid modifiers are specified.
      
      Before:
      
        # echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        -su: echo: write error: Invalid argument
        # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
      
      After:
      
        # echo 'hist:keys=pid:ts0=common_timestamp.junkusecs' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        -su: echo: write error: Invalid argument
        # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/hist
        ERROR: Invalid field modifier: junkusecs
          Last command: keys=pid:ts0=common_timestamp.junkusecs
      
      Link: http://lkml.kernel.org/r/b043c59fa79acd06a5f14a1d44dee9e5a3cd1248.1524790601.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      dcf23457
    • T
      tracing: Add field parsing hist error for hist triggers · 5ec432d7
      Tom Zanussi 提交于
      If the user specifies a nonexistent field for a hist trigger, the
      current code correctly flags that as an error, but doesn't tell the
      user what happened.
      
      Fix this by invoking hist_err() with an appropriate message when
      nonexistent fields are specified.
      
      Before:
      
        # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
        -su: echo: write error: Invalid argument
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
      
      After:
      
        # echo 'hist:keys=pid:ts0=common_timestamp.usecs' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger
        -su: echo: write error: Invalid argument
        # cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist
        ERROR: Couldn't find field: pid
          Last command: keys=pid:ts0=common_timestamp.usecs
      
      Link: http://lkml.kernel.org/r/fdc8746969d16906120f162b99dd71c741e0b62c.1524790601.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Reported-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      5ec432d7
    • T
      tracing: Restore proper field flag printing when displaying triggers · 608940da
      Tom Zanussi 提交于
      The flag-printing code used when displaying hist triggers somehow got
      dropped during refactoring of the inter-event patchset.  This restores
      it.
      
      Below are a couple examples - in the first case, .usecs wasn't being
      displayed properly for common_timestamps and the second illustrates
      the same for other flags such as .execname.
      
      Before:
      
        # echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
        # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
        hist:keys=common_pid:vals=hitcount,count:sort=count:size=2048 [active]
      
        # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        hist:keys=pid:vals=hitcount:ts0=common_timestamp:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]
      
      After:
      
        # echo 'hist:key=common_pid.execname:val=count:sort=count' > /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
        # cat /sys/kernel/debug/tracing/events/syscalls/sys_enter_read/trigger
        hist:keys=common_pid.execname:vals=hitcount,count:sort=count:size=2048 [active]
      
        # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        # cat /sys/kernel/debug/tracing/events/sched/sched_wakeup/trigger
        hist:keys=pid:vals=hitcount:ts0=common_timestamp.usecs:sort=hitcount:size=2048:clock=global if comm=="cyclictest" [active]
      
      Link: http://lkml.kernel.org/r/492bab42ff21806600af98a8ea901af10efbee0c.1524790601.git.tom.zanussi@linux.intel.comSigned-off-by: NTom Zanussi <tom.zanussi@linux.intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      608940da
    • S
      tracing: Fix bad use of igrab in trace_uprobe.c · 0c92c7a3
      Song Liu 提交于
      As Miklos reported and suggested:
      
        This pattern repeats two times in trace_uprobe.c and in
        kernel/events/core.c as well:
      
            ret = kern_path(filename, LOOKUP_FOLLOW, &path);
            if (ret)
                goto fail_address_parse;
      
            inode = igrab(d_inode(path.dentry));
            path_put(&path);
      
        And it's wrong.  You can only hold a reference to the inode if you
        have an active ref to the superblock as well (which is normally
        through path.mnt) or holding s_umount.
      
        This way unmounting the containing filesystem while the tracepoint is
        active will give you the "VFS: Busy inodes after unmount..." message
        and a crash when the inode is finally put.
      
        Solution: store path instead of inode.
      
      This patch fixes two instances in trace_uprobe.c. struct path is added to
      struct trace_uprobe to keep the inode and containing mount point
      referenced.
      
      Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com
      
      Fixes: f3f096cf ("tracing: Provide trace events interface for uprobes")
      Fixes: 33ea4b24 ("perf/core: Implement the 'perf_uprobe' PMU")
      Cc: stable@vger.kernel.org
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Howard McLauchlan <hmclauchlan@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NMiklos Szeredi <mszeredi@redhat.com>
      Reported-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0c92c7a3
  4. 26 4月, 2018 1 次提交
    • T
      Revert: Unify CLOCK_MONOTONIC and CLOCK_BOOTTIME · a3ed0e43
      Thomas Gleixner 提交于
      Revert commits
      
      92af4dcb ("tracing: Unify the "boot" and "mono" tracing clocks")
      127bfa5f ("hrtimer: Unify MONOTONIC and BOOTTIME clock behavior")
      7250a404 ("posix-timers: Unify MONOTONIC and BOOTTIME clock behavior")
      d6c7270e ("timekeeping: Remove boot time specific code")
      f2d6fdbf ("Input: Evdev - unify MONOTONIC and BOOTTIME clock behavior")
      d6ed449a ("timekeeping: Make the MONOTONIC clock behave like the BOOTTIME clock")
      72199320 ("timekeeping: Add the new CLOCK_MONOTONIC_ACTIVE clock")
      
      As stated in the pull request for the unification of CLOCK_MONOTONIC and
      CLOCK_BOOTTIME, it was clear that we might have to revert the change.
      
      As reported by several folks systemd and other applications rely on the
      documented behaviour of CLOCK_MONOTONIC on Linux and break with the above
      changes. After resume daemons time out and other timeout related issues are
      observed. Rafael compiled this list:
      
      * systemd kills daemons on resume, after >WatchdogSec seconds
        of suspending (Genki Sky).  [Verified that that's because systemd uses
        CLOCK_MONOTONIC and expects it to not include the suspend time.]
      
      * systemd-journald misbehaves after resume:
        systemd-journald[7266]: File /var/log/journal/016627c3c4784cd4812d4b7e96a34226/system.journal
      corrupted or uncleanly shut down, renaming and replacing.
        (Mike Galbraith).
      
      * NetworkManager reports "networking disabled" and networking is broken
        after resume 50% of the time (Pavel).  [May be because of systemd.]
      
      * MATE desktop dims the display and starts the screensaver right after
        system resume (Pavel).
      
      * Full system hang during resume (me).  [May be due to systemd or NM or both.]
      
      That happens on debian and open suse systems.
      
      It's sad, that these problems were neither catched in -next nor by those
      folks who expressed interest in this change.
      Reported-by: NRafael J. Wysocki <rjw@rjwysocki.net>
      Reported-by: Genki Sky <sky@genki.is>,
      Reported-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kevin Easton <kevin@guarana.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Salyzyn <salyzyn@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      a3ed0e43
  5. 25 4月, 2018 2 次提交
  6. 17 4月, 2018 1 次提交
  7. 11 4月, 2018 7 次提交
    • S
      tracing: Enforce passing in filter=NULL to create_filter() · 0b3dec05
      Steven Rostedt (VMware) 提交于
      There's some inconsistency with what to set the output parameter filterp
      when passing to create_filter(..., struct event_filter **filterp).
      
      Whatever filterp points to, should be NULL when calling this function. The
      create_filter() calls create_filter_start() with a pointer to a local
      "filter" variable that is set to NULL. The create_filter_start() has a
      WARN_ON() if the passed in pointer isn't pointing to a value set to NULL.
      
      Ideally, create_filter() should pass the filterp variable it received to
      create_filter_start() and not hide it as with a local variable, this allowed
      create_filter() to fail, and not update the passed in filter, and the caller
      of create_filter() then tried to free filter, which was never initialized to
      anything, causing memory corruption.
      
      Link: http://lkml.kernel.org/r/00000000000032a0c30569916870@google.com
      
      Fixes: 80765597 ("tracing: Rewrite filter logic to be simpler and faster")
      Reported-by: syzbot+dadcc936587643d7f568@syzkaller.appspotmail.com
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0b3dec05
    • R
      trace_uprobe: Simplify probes_seq_show() · a64b2c01
      Ravi Bangoria 提交于
      Simplify probes_seq_show() function. No change in output
      before and after patch.
      
      Link: http://lkml.kernel.org/r/20180315082756.9050-2-ravi.bangoria@linux.vnet.ibm.comAcked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a64b2c01
    • R
      trace_uprobe: Use %lx to display offset · 18d45b11
      Ravi Bangoria 提交于
      tu->offset is unsigned long, not a pointer, thus %lx should
      be used to print it, not the %px.
      
      Link: http://lkml.kernel.org/r/20180315082756.9050-1-ravi.bangoria@linux.vnet.ibm.com
      
      Cc: stable@vger.kernel.org
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Fixes: 0e4d819d ("trace_uprobe: Display correct offset in uprobe_events")
      Suggested-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NRavi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      18d45b11
    • H
      tracing/uprobe: Add support for overlayfs · f0a2aa5a
      Howard McLauchlan 提交于
      uprobes cannot successfully attach to binaries located in a directory
      mounted with overlayfs.
      
      To verify, create directories for mounting overlayfs
      (upper,lower,work,merge), move some binary into merge/ and use readelf
      to obtain some known instruction of the binary. I used /bin/true and the
      entry instruction(0x13b0):
      
      	$ mount -t overlay overlay -o lowerdir=lower,upperdir=upper,workdir=work merge
      	$ cd /sys/kernel/debug/tracing
      	$ echo 'p:true_entry PATH_TO_MERGE/merge/true:0x13b0' > uprobe_events
      	$ echo 1 > events/uprobes/true_entry/enable
      
      This returns 'bash: echo: write error: Input/output error' and dmesg
      tells us 'event trace: Could not enable event true_entry'
      
      This change makes create_trace_uprobe() look for the real inode of a
      dentry. In the case of normal filesystems, this simplifies to just
      returning the inode. In the case of overlayfs(and similar fs) we will
      obtain the underlying dentry and corresponding inode, upon which uprobes
      can successfully register.
      
      Running the example above with the patch applied, we can see that the
      uprobe is enabled and will output to trace as expected.
      
      Link: http://lkml.kernel.org/r/20180410231030.2720-1-hmclauchlan@fb.comReviewed-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NHoward McLauchlan <hmclauchlan@fb.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      f0a2aa5a
    • J
      tracing: Use ARRAY_SIZE() macro instead of open coding it · 0a4d0564
      Jérémy Lefaure 提交于
      It is useless to re-invent the ARRAY_SIZE macro so let's use it instead
      of DATA_CNT.
      
      Found with Coccinelle with the following semantic patch:
      @r depends on (org || report)@
      type T;
      T[] E;
      position p;
      @@
      (
       (sizeof(E)@p /sizeof(*E))
      |
       (sizeof(E)@p /sizeof(E[...]))
      |
       (sizeof(E)@p /sizeof(T))
      )
      
      Link: http://lkml.kernel.org/r/20171016012250.26453-1-jeremy.lefaure@lse.epita.frSigned-off-by: NJérémy Lefaure <jeremy.lefaure@lse.epita.fr>
      [ Removed useless include of kernel.h ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      0a4d0564
    • Y
      bpf/tracing: fix a deadlock in perf_event_detach_bpf_prog · 3a38bb98
      Yonghong Song 提交于
      syzbot reported a possible deadlock in perf_event_detach_bpf_prog.
      The error details:
        ======================================================
        WARNING: possible circular locking dependency detected
        4.16.0-rc7+ #3 Not tainted
        ------------------------------------------------------
        syz-executor7/24531 is trying to acquire lock:
         (bpf_event_mutex){+.+.}, at: [<000000008a849b07>] perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
      
        but task is already holding lock:
         (&mm->mmap_sem){++++}, at: [<0000000038768f87>] vm_mmap_pgoff+0x198/0x280 mm/util.c:353
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #1 (&mm->mmap_sem){++++}:
             __might_fault+0x13a/0x1d0 mm/memory.c:4571
             _copy_to_user+0x2c/0xc0 lib/usercopy.c:25
             copy_to_user include/linux/uaccess.h:155 [inline]
             bpf_prog_array_copy_info+0xf2/0x1c0 kernel/bpf/core.c:1694
             perf_event_query_prog_array+0x1c7/0x2c0 kernel/trace/bpf_trace.c:891
             _perf_ioctl kernel/events/core.c:4750 [inline]
             perf_ioctl+0x3e1/0x1480 kernel/events/core.c:4770
             vfs_ioctl fs/ioctl.c:46 [inline]
             do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:686
             SYSC_ioctl fs/ioctl.c:701 [inline]
             SyS_ioctl+0x8f/0xc0 fs/ioctl.c:692
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
        -> #0 (bpf_event_mutex){+.+.}:
             lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
             __mutex_lock_common kernel/locking/mutex.c:756 [inline]
             __mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
             perf_event_detach_bpf_prog+0x92/0x3d0 kernel/trace/bpf_trace.c:854
             perf_event_free_bpf_prog kernel/events/core.c:8147 [inline]
             _free_event+0xbdb/0x10f0 kernel/events/core.c:4116
             put_event+0x24/0x30 kernel/events/core.c:4204
             perf_mmap_close+0x60d/0x1010 kernel/events/core.c:5172
             remove_vma+0xb4/0x1b0 mm/mmap.c:172
             remove_vma_list mm/mmap.c:2490 [inline]
             do_munmap+0x82a/0xdf0 mm/mmap.c:2731
             mmap_region+0x59e/0x15a0 mm/mmap.c:1646
             do_mmap+0x6c0/0xe00 mm/mmap.c:1483
             do_mmap_pgoff include/linux/mm.h:2223 [inline]
             vm_mmap_pgoff+0x1de/0x280 mm/util.c:355
             SYSC_mmap_pgoff mm/mmap.c:1533 [inline]
             SyS_mmap_pgoff+0x462/0x5f0 mm/mmap.c:1491
             SYSC_mmap arch/x86/kernel/sys_x86_64.c:100 [inline]
             SyS_mmap+0x16/0x20 arch/x86/kernel/sys_x86_64.c:91
             do_syscall_64+0x281/0x940 arch/x86/entry/common.c:287
             entry_SYSCALL_64_after_hwframe+0x42/0xb7
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
               CPU0                    CPU1
               ----                    ----
          lock(&mm->mmap_sem);
                                       lock(bpf_event_mutex);
                                       lock(&mm->mmap_sem);
          lock(bpf_event_mutex);
      
         *** DEADLOCK ***
        ======================================================
      
      The bug is introduced by Commit f371b304 ("bpf/tracing: allow
      user space to query prog array on the same tp") where copy_to_user,
      which requires mm->mmap_sem, is called inside bpf_event_mutex lock.
      At the same time, during perf_event file descriptor close,
      mm->mmap_sem is held first and then subsequent
      perf_event_detach_bpf_prog needs bpf_event_mutex lock.
      Such a senario caused a deadlock.
      
      As suggested by Daniel, moving copy_to_user out of the
      bpf_event_mutex lock should fix the problem.
      
      Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp")
      Reported-by: syzbot+dc5ca0e4c9bfafaf2bae@syzkaller.appspotmail.com
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      3a38bb98
    • M
      tracing/uprobe_event: Fix strncpy corner case · 50268a3d
      Masami Hiramatsu 提交于
      Fix string fetch function to terminate with NUL.
      It is OK to drop the rest of string.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org
      Cc: 范龙飞 <long7573@126.com>
      Fixes: 5baaa59e ("tracing/probes: Implement 'memory' fetch method for uprobes")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      50268a3d
  8. 10 4月, 2018 2 次提交
    • S
      perf/core: Fix perf_uprobe_init() · 0eadcc7a
      Song Liu 提交于
      Similarly to the uprobe PMU fix in perf_kprobe_init(), fix error
      handling in perf_uprobe_init() as well.
      Reported-by: N范龙飞 <long7573@126.com>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e12f03d7 ("perf/core: Implement the 'perf_kprobe' PMU")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0eadcc7a
    • M
      perf/core: Fix perf_kprobe_init() · 5da13ab8
      Masami Hiramatsu 提交于
      Fix error handling in perf_kprobe_init():
      
      	==================================================================
      	BUG: KASAN: slab-out-of-bounds in strlen+0x8e/0xa0 lib/string.c:482
      	Read of size 1 at addr ffff88003f9cc5c0 by task syz-executor2/23095
      
      	CPU: 0 PID: 23095 Comm: syz-executor2 Not tainted 4.16.0+ #24
      	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
      	Call Trace:
      	 __dump_stack lib/dump_stack.c:77 [inline]
      	 dump_stack+0xca/0x13e lib/dump_stack.c:113
      	 print_address_description+0x6e/0x2c0 mm/kasan/report.c:256
      	 kasan_report_error mm/kasan/report.c:354 [inline]
      	 kasan_report+0x256/0x380 mm/kasan/report.c:412
      	 strlen+0x8e/0xa0 lib/string.c:482
      	 kstrdup+0x21/0x70 mm/util.c:55
      	 alloc_trace_kprobe+0xc8/0x930 kernel/trace/trace_kprobe.c:325
      	 create_local_trace_kprobe+0x4f/0x3a0 kernel/trace/trace_kprobe.c:1438
      	 perf_kprobe_init+0x149/0x1f0 kernel/trace/trace_event_perf.c:264
      	 perf_kprobe_event_init+0xa8/0x120 kernel/events/core.c:8407
      	 perf_try_init_event+0xcb/0x2a0 kernel/events/core.c:9719
      	 perf_init_event kernel/events/core.c:9750 [inline]
      	 perf_event_alloc+0x1367/0x1e20 kernel/events/core.c:10022
      	 SYSC_perf_event_open+0x242/0x2330 kernel/events/core.c:10477
      	 do_syscall_64+0x198/0x640 arch/x86/entry/common.c:287
      	 entry_SYSCALL_64_after_hwframe+0x42/0xb7
      Reported-by: N范龙飞 <long7573@126.com>
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: e12f03d7 ("perf/core: Implement the 'perf_kprobe' PMU")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5da13ab8
  9. 06 4月, 2018 14 次提交
  10. 31 3月, 2018 1 次提交
    • A
      bpf: Check attach type at prog load time · 5e43f899
      Andrey Ignatov 提交于
      == The problem ==
      
      There are use-cases when a program of some type can be attached to
      multiple attach points and those attach points must have different
      permissions to access context or to call helpers.
      
      E.g. context structure may have fields for both IPv4 and IPv6 but it
      doesn't make sense to read from / write to IPv6 field when attach point
      is somewhere in IPv4 stack.
      
      Same applies to BPF-helpers: it may make sense to call some helper from
      some attach point, but not from other for same prog type.
      
      == The solution ==
      
      Introduce `expected_attach_type` field in in `struct bpf_attr` for
      `BPF_PROG_LOAD` command. If scenario described in "The problem" section
      is the case for some prog type, the field will be checked twice:
      
      1) At load time prog type is checked to see if attach type for it must
         be known to validate program permissions correctly. Prog will be
         rejected with EINVAL if it's the case and `expected_attach_type` is
         not specified or has invalid value.
      
      2) At attach time `attach_type` is compared with `expected_attach_type`,
         if prog type requires to have one, and, if they differ, attach will
         be rejected with EINVAL.
      
      The `expected_attach_type` is now available as part of `struct bpf_prog`
      in both `bpf_verifier_ops->is_valid_access()` and
      `bpf_verifier_ops->get_func_proto()` () and can be used to check context
      accesses and calls to helpers correspondingly.
      
      Initially the idea was discussed by Alexei Starovoitov <ast@fb.com> and
      Daniel Borkmann <daniel@iogearbox.net> here:
      https://marc.info/?l=linux-netdev&m=152107378717201&w=2Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5e43f899
  11. 29 3月, 2018 1 次提交
    • A
      bpf: introduce BPF_RAW_TRACEPOINT · c4f6699d
      Alexei Starovoitov 提交于
      Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
      kernel internal arguments of the tracepoints in their raw form.
      
      >From bpf program point of view the access to the arguments look like:
      struct bpf_raw_tracepoint_args {
             __u64 args[0];
      };
      
      int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
      {
        // program can read args[N] where N depends on tracepoint
        // and statically verified at program load+attach time
      }
      
      kprobe+bpf infrastructure allows programs access function arguments.
      This feature allows programs access raw tracepoint arguments.
      
      Similar to proposed 'dynamic ftrace events' there are no abi guarantees
      to what the tracepoints arguments are and what their meaning is.
      The program needs to type cast args properly and use bpf_probe_read()
      helper to access struct fields when argument is a pointer.
      
      For every tracepoint __bpf_trace_##call function is prepared.
      In assembler it looks like:
      (gdb) disassemble __bpf_trace_xdp_exception
      Dump of assembler code for function __bpf_trace_xdp_exception:
         0xffffffff81132080 <+0>:     mov    %ecx,%ecx
         0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>
      
      where
      
      TRACE_EVENT(xdp_exception,
              TP_PROTO(const struct net_device *dev,
                       const struct bpf_prog *xdp, u32 act),
      
      The above assembler snippet is casting 32-bit 'act' field into 'u64'
      to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
      All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
      and in total this approach adds 7k bytes to .text.
      
      This approach gives the lowest possible overhead
      while calling trace_xdp_exception() from kernel C code and
      transitioning into bpf land.
      Since tracepoint+bpf are used at speeds of 1M+ events per second
      this is valuable optimization.
      
      The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
      that returns anon_inode FD of 'bpf-raw-tracepoint' object.
      
      The user space looks like:
      // load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
      prog_fd = bpf_prog_load(...);
      // receive anon_inode fd for given bpf_raw_tracepoint with prog attached
      raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);
      
      Ctrl-C of tracing daemon or cmdline tool that uses this feature
      will automatically detach bpf program, unload it and
      unregister tracepoint probe.
      
      On the kernel side the __bpf_raw_tp_map section of pointers to
      tracepoint definition and to __bpf_trace_*() probe function is used
      to find a tracepoint with "xdp_exception" name and
      corresponding __bpf_trace_xdp_exception() probe function
      which are passed to tracepoint_probe_register() to connect probe
      with tracepoint.
      
      Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
      tracepoint mechanisms. perf_event_open() can be used in parallel
      on the same tracepoint.
      Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
      Each with its own bpf program. The kernel will execute
      all tracepoint probes and all attached bpf programs.
      
      In the future bpf_raw_tracepoints can be extended with
      query/introspection logic.
      
      __bpf_raw_tp_map section logic was contributed by Steven Rostedt
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c4f6699d
  12. 27 3月, 2018 1 次提交
  13. 26 3月, 2018 1 次提交
  14. 24 3月, 2018 1 次提交
  15. 23 3月, 2018 1 次提交
  16. 21 3月, 2018 1 次提交