1. 17 11月, 2017 1 次提交
  2. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  3. 30 10月, 2017 1 次提交
  4. 27 10月, 2017 9 次提交
  5. 10 10月, 2017 2 次提交
  6. 29 9月, 2017 2 次提交
  7. 21 9月, 2017 1 次提交
    • Y
      bpf: one perf event close won't free bpf program attached by another perf event · ec9dd352
      Yonghong Song 提交于
      This patch fixes a bug exhibited by the following scenario:
        1. fd1 = perf_event_open with attr.config = ID1
        2. attach bpf program prog1 to fd1
        3. fd2 = perf_event_open with attr.config = ID1
           <this will be successful>
        4. user program closes fd2 and prog1 is detached from the tracepoint.
        5. user program with fd1 does not work properly as tracepoint
           no output any more.
      
      The issue happens at step 4. Multiple perf_event_open can be called
      successfully, but only one bpf prog pointer in the tp_event. In the
      current logic, any fd release for the same tp_event will free
      the tp_event->prog.
      
      The fix is to free tp_event->prog only when the closing fd
      corresponds to the one which registered the program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec9dd352
  8. 01 9月, 2017 1 次提交
    • E
      mm, uprobes: fix multiple free of ->uprobes_state.xol_area · 355627f5
      Eric Biggers 提交于
      Commit 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for
      write killable") made it possible to kill a forking task while it is
      waiting to acquire its ->mmap_sem for write, in dup_mmap().
      
      However, it was overlooked that this introduced an new error path before
      the new mm_struct's ->uprobes_state.xol_area has been set to NULL after
      being copied from the old mm_struct by the memcpy in dup_mm().  For a
      task that has previously hit a uprobe tracepoint, this resulted in the
      'struct xol_area' being freed multiple times if the task was killed at
      just the right time while forking.
      
      Fix it by setting ->uprobes_state.xol_area to NULL in mm_init() rather
      than in uprobe_dup_mmap().
      
      With CONFIG_UPROBE_EVENTS=y, the bug can be reproduced by the same C
      program given by commit 2b7e8665 ("fork: fix incorrect fput of
      ->exe_file causing use-after-free"), provided that a uprobe tracepoint
      has been set on the fork_thread() function.  For example:
      
          $ gcc reproducer.c -o reproducer -lpthread
          $ nm reproducer | grep fork_thread
          0000000000400719 t fork_thread
          $ echo "p $PWD/reproducer:0x719" > /sys/kernel/debug/tracing/uprobe_events
          $ echo 1 > /sys/kernel/debug/tracing/events/uprobes/enable
          $ ./reproducer
      
      Here is the use-after-free reported by KASAN:
      
          BUG: KASAN: use-after-free in uprobe_clear_state+0x1c4/0x200
          Read of size 8 at addr ffff8800320a8b88 by task reproducer/198
      
          CPU: 1 PID: 198 Comm: reproducer Not tainted 4.13.0-rc7-00015-g36fde05f #255
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
          Call Trace:
           dump_stack+0xdb/0x185
           print_address_description+0x7e/0x290
           kasan_report+0x23b/0x350
           __asan_report_load8_noabort+0x19/0x20
           uprobe_clear_state+0x1c4/0x200
           mmput+0xd6/0x360
           do_exit+0x740/0x1670
           do_group_exit+0x13f/0x380
           get_signal+0x597/0x17d0
           do_signal+0x99/0x1df0
           exit_to_usermode_loop+0x166/0x1e0
           syscall_return_slowpath+0x258/0x2c0
           entry_SYSCALL_64_fastpath+0xbc/0xbe
      
          ...
      
          Allocated by task 199:
           save_stack_trace+0x1b/0x20
           kasan_kmalloc+0xfc/0x180
           kmem_cache_alloc_trace+0xf3/0x330
           __create_xol_area+0x10f/0x780
           uprobe_notify_resume+0x1674/0x2210
           exit_to_usermode_loop+0x150/0x1e0
           prepare_exit_to_usermode+0x14b/0x180
           retint_user+0x8/0x20
      
          Freed by task 199:
           save_stack_trace+0x1b/0x20
           kasan_slab_free+0xa8/0x1a0
           kfree+0xba/0x210
           uprobe_clear_state+0x151/0x200
           mmput+0xd6/0x360
           copy_process.part.8+0x605f/0x65d0
           _do_fork+0x1a5/0xbd0
           SyS_clone+0x19/0x20
           do_syscall_64+0x22f/0x660
           return_from_SYSCALL_64+0x0/0x7a
      
      Note: without KASAN, you may instead see a "Bad page state" message, or
      simply a general protection fault.
      
      Link: http://lkml.kernel.org/r/20170830033303.17927-1-ebiggers3@gmail.com
      Fixes: 7c051267 ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>    [4.7+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      355627f5
  9. 29 8月, 2017 4 次提交
    • K
      perf/core, x86: Add PERF_SAMPLE_PHYS_ADDR · fc7ce9c7
      Kan Liang 提交于
      For understanding how the workload maps to memory channels and hardware
      behavior, it's very important to collect address maps with physical
      addresses. For example, 3D XPoint access can only be found by filtering
      the physical address.
      
      Add a new sample type for physical address.
      
      perf already has a facility to collect data virtual address. This patch
      introduces a function to convert the virtual address to physical address.
      The function is quite generic and can be extended to any architecture as
      long as a virtual address is provided.
      
       - For kernel direct mapping addresses, virt_to_phys is used to convert
         the virtual addresses to physical address.
      
       - For user virtual addresses, __get_user_pages_fast is used to walk the
         pages tables for user physical address.
      
       - This does not work for vmalloc addresses right now. These are not
         resolved, but code to do that could be added.
      
      The new sample type requires collecting the virtual address. The
      virtual address will not be output unless SAMPLE_ADDR is applied.
      
      For security, the physical address can only be exposed to root or
      privileged user.
      Tested-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: mpe@ellerman.id.au
      Link: http://lkml.kernel.org/r/1503967969-48278-1-git-send-email-kan.liang@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fc7ce9c7
    • A
      perf/core, pt, bts: Get rid of itrace_started · 8d4e6c4c
      Alexander Shishkin 提交于
      I just noticed that hw.itrace_started and hw.config are aliased to the
      same location. Now, the PT driver happens to use both, which works out
      fine by sheer luck:
      
       - STORE(hw.itrace_start) is ordered before STORE(hw.config), in the
          program order, although there are no compiler barriers to ensure that,
      
       - to the perf_log_itrace_start() hw.itrace_start looks set at the same
         time as when it is intended to be set because both stores happen in the
         same path,
      
       - hw.config is never reset to zero in the PT driver.
      
      Now, the use of hw.config by the PT driver makes more sense (it being a
      HW PMU) than messing around with itrace_started, which is an awkward API
      to begin with.
      
      This patch replaces hw.itrace_started with an attach_state bit and an
      API call for the PMU drivers to use to communicate the condition.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20170330153956.25994-1-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8d4e6c4c
    • Z
      perf/ftrace: Fix double traces of perf on ftrace:function · 75e83876
      Zhou Chengming 提交于
      When running perf on the ftrace:function tracepoint, there is a bug
      which can be reproduced by:
      
        perf record -e ftrace:function -a sleep 20 &
        perf record -e ftrace:function ls
        perf script
      
                    ls 10304 [005]   171.853235: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853237: ftrace:function:
        perf_output_begin
                    ls 10304 [005]   171.853239: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853240: ftrace:function:
        task_tgid_nr_ns
                    ls 10304 [005]   171.853242: ftrace:function:
        __task_pid_nr_ns
                    ls 10304 [005]   171.853244: ftrace:function:
        __task_pid_nr_ns
      
      We can see that all the function traces are doubled.
      
      The problem is caused by the inconsistency of the register
      function perf_ftrace_event_register() with the probe function
      perf_ftrace_function_call(). The former registers one probe
      for every perf_event. And the latter handles all perf_events
      on the current cpu. So when two perf_events on the current cpu,
      the traces of them will be doubled.
      
      So this patch adds an extra parameter "event" for perf_tp_event,
      only send sample data to this event when it's not NULL.
      Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: huawei.libin@huawei.com
      Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      75e83876
    • M
      perf/core: Fix potential double-fetch bug · f12f42ac
      Meng Xu 提交于
      While examining the kernel source code, I found a dangerous operation that
      could turn into a double-fetch situation (a race condition bug) where the same
      userspace memory region are fetched twice into kernel with sanity checks after
      the first fetch while missing checks after the second fetch.
      
        1. The first fetch happens in line 9573 get_user(size, &uattr->size).
      
        2. Subsequently the 'size' variable undergoes a few sanity checks and
           transformations (line 9577 to 9584).
      
        3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size)
      
        4. Given that 'uattr' can be fully controlled in userspace, an attacker can
           race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF)
           after the first fetch but before the second fetch. The changed value will be
           copied to 'attr->size'.
      
        5. There is no further checks on 'attr->size' until the end of this function,
           and once the function returns, we lose the context to verify that 'attr->size'
           conforms to the sanity checks performed in step 2 (line 9577 to 9584).
      
        6. My manual analysis shows that 'attr->size' is not used elsewhere later,
           so, there is no working exploit against it right now. However, this could
           easily turns to an exploitable one if careless developers start to use
           'attr->size' later.
      
      To fix this, override 'attr->size' from the second fetch to the one from the
      first fetch, regardless of what is actually copied in.
      
      In this way, it is assured that 'attr->size' is consistent with the checks
      performed after the first fetch.
      Signed-off-by: NMeng Xu <mengxu.gatech@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: alexander.shishkin@linux.intel.com
      Cc: meng.xu@gatech.edu
      Cc: sanidhya@gatech.edu
      Cc: taesoo@gatech.edu
      Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.eduSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f12f42ac
  10. 25 8月, 2017 5 次提交
    • J
      tracing, perf: Adjust code layout in get_recursion_context() · d0618410
      Jesper Dangaard Brouer 提交于
      In an XDP redirect applications using tracepoint xdp:xdp_redirect to
      diagnose TX overrun, I noticed perf_swevent_get_recursion_context()
      was consuming 2% CPU. This was reduced to 1.85% with this simple
      change.
      
      Looking at the annotated asm code, it was clear that the unlikely case
      in_nmi() test was chosen (by the compiler) as the most likely
      event/branch.  This small adjustment makes the compiler (GCC version
      7.1.1 20170622 (Red Hat 7.1.1-3)) put in_nmi() as an unlikely branch.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/150342256382.16595.986861478681783732.stgit@firesoulSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0618410
    • O
      perf/core: Don't report zero PIDs for exiting tasks · 1d953111
      Oleg Nesterov 提交于
      The exiting/dead task has no PIDs and in this case perf_event_pid/tid()
      return zero, change them to return -1 to distinguish this case from
      idle threads.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170822155928.GA6892@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1d953111
    • W
      perf/aux: Ensure aux_wakeup represents most recent wakeup index · d9a50b02
      Will Deacon 提交于
      The aux_watermark member of struct ring_buffer represents the period (in
      terms of bytes) at which wakeup events should be generated when data is
      written to the aux buffer in non-snapshot mode. On hardware that cannot
      generate an interrupt when the aux_head reaches an arbitrary wakeup index
      (such as ARM SPE), the aux_head sampled from handle->head in
      perf_aux_output_{skip,end} may in fact be past the wakeup index. This
      can lead to wakeup slowly falling behind the head. For example, consider
      the case where hardware can only generate an interrupt on a page-boundary
      and the aux buffer is initialised as follows:
      
        // Buffer size is 2 * PAGE_SIZE
        rb->aux_head = rb->aux_wakeup = 0
        rb->aux_watermark = PAGE_SIZE / 2
      
      following the first perf_aux_output_begin call, the handle is
      initialised with:
      
        handle->head = 0
        handle->size = 2 * PAGE_SIZE
        handle->wakeup = PAGE_SIZE / 2
      
      and the hardware will be programmed to generate an interrupt at
      PAGE_SIZE.
      
      When the interrupt is raised, the hardware head will be at PAGE_SIZE,
      so calling perf_aux_output_end(handle, PAGE_SIZE) puts the ring buffer
      into the following state:
      
        rb->aux_head = PAGE_SIZE
        rb->aux_wakeup = PAGE_SIZE / 2
        rb->aux_watermark = PAGE_SIZE / 2
      
      and then the next call to perf_aux_output_begin will result in:
      
        handle->head = handle->wakeup = PAGE_SIZE
      
      for which the semantics are unclear and, for a smaller aux_watermark
      (e.g. PAGE_SIZE / 4), then the wakeup would in fact be behind head at
      this point.
      
      This patch fixes the problem by rounding down the aux_head (as sampled
      from the handle) to the nearest aux_watermark boundary when updating
      rb->aux_wakeup, therefore taking into account any overruns by the
      hardware.
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1502900297-21839-2-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d9a50b02
    • W
      perf/aux: Make aux_{head,wakeup} ring_buffer members long · 2ab346cf
      Will Deacon 提交于
      The aux_head and aux_wakeup members of struct ring_buffer are defined
      using the local_t type, despite the fact that they are only accessed via
      the perf_aux_output_*() functions, which cannot race with each other for a
      given ring buffer.
      
      This patch changes the type of the members to long, so we can avoid
      using the local_*() API where it isn't needed.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arm-kernel@lists.infradead.org
      Link: http://lkml.kernel.org/r/1502900297-21839-1-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2ab346cf
    • M
      perf/core: Fix group {cpu,task} validation · 64aee2a9
      Mark Rutland 提交于
      Regardless of which events form a group, it does not make sense for the
      events to target different tasks and/or CPUs, as this leaves the group
      inconsistent and impossible to schedule. The core perf code assumes that
      these are consistent across (successfully intialised) groups.
      
      Core perf code only verifies this when moving SW events into a HW
      context. Thus, we can violate this requirement for pure SW groups and
      pure HW groups, unless the relevant PMU driver happens to perform this
      verification itself. These mismatched groups subsequently wreak havoc
      elsewhere.
      
      For example, we handle watchpoints as SW events, and reserve watchpoint
      HW on a per-CPU basis at pmu::event_init() time to ensure that any event
      that is initialised is guaranteed to have a slot at pmu::add() time.
      However, the core code only checks the group leader's cpu filter (via
      event_filter_match()), and can thus install follower events onto CPUs
      violating thier (mismatched) CPU filters, potentially installing them
      into a CPU without sufficient reserved slots.
      
      This can be triggered with the below test case, resulting in warnings
      from arch backends.
      
        #define _GNU_SOURCE
        #include <linux/hw_breakpoint.h>
        #include <linux/perf_event.h>
        #include <sched.h>
        #include <stdio.h>
        #include <sys/prctl.h>
        #include <sys/syscall.h>
        #include <unistd.h>
      
        static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
      			   int group_fd, unsigned long flags)
        {
      	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
        }
      
        char watched_char;
      
        struct perf_event_attr wp_attr = {
      	.type = PERF_TYPE_BREAKPOINT,
      	.bp_type = HW_BREAKPOINT_RW,
      	.bp_addr = (unsigned long)&watched_char,
      	.bp_len = 1,
      	.size = sizeof(wp_attr),
        };
      
        int main(int argc, char *argv[])
        {
      	int leader, ret;
      	cpu_set_t cpus;
      
      	/*
      	 * Force use of CPU0 to ensure our CPU0-bound events get scheduled.
      	 */
      	CPU_ZERO(&cpus);
      	CPU_SET(0, &cpus);
      	ret = sched_setaffinity(0, sizeof(cpus), &cpus);
      	if (ret) {
      		printf("Unable to set cpu affinity\n");
      		return 1;
      	}
      
      	/* open leader event, bound to this task, CPU0 only */
      	leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	if (leader < 0) {
      		printf("Couldn't open leader: %d\n", leader);
      		return 1;
      	}
      
      	/*
      	 * Open a follower event that is bound to the same task, but a
      	 * different CPU. This means that the group should never be possible to
      	 * schedule.
      	 */
      	ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
      	if (ret < 0) {
      		printf("Couldn't open mismatched follower: %d\n", ret);
      		return 1;
      	} else {
      		printf("Opened leader/follower with mismastched CPUs\n");
      	}
      
      	/*
      	 * Open as many independent events as we can, all bound to the same
      	 * task, CPU0 only.
      	 */
      	do {
      		ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
      	} while (ret >= 0);
      
      	/*
      	 * Force enable/disble all events to trigger the erronoeous
      	 * installation of the follower event.
      	 */
      	printf("Opened all events. Toggling..\n");
      	for (;;) {
      		prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
      		prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
      	}
      
      	return 0;
        }
      
      Fix this by validating this requirement regardless of whether we're
      moving events.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhou Chengming <zhouchengming1@huawei.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/1498142498-15758-1-git-send-email-mark.rutland@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      64aee2a9
  11. 10 8月, 2017 3 次提交
  12. 08 8月, 2017 1 次提交
    • Y
      bpf: add support for sys_enter_* and sys_exit_* tracepoints · cf5f5cea
      Yonghong Song 提交于
      Currently, bpf programs cannot be attached to sys_enter_* and sys_exit_*
      style tracepoints. The iovisor/bcc issue #748
      (https://github.com/iovisor/bcc/issues/748) documents this issue.
      For example, if you try to attach a bpf program to tracepoints
      syscalls/sys_enter_newfstat, you will get the following error:
         # ./tools/trace.py t:syscalls:sys_enter_newfstat
         Ioctl(PERF_EVENT_IOC_SET_BPF): Invalid argument
         Failed to attach BPF to tracepoint
      
      The main reason is that syscalls/sys_enter_* and syscalls/sys_exit_*
      tracepoints are treated differently from other tracepoints and there
      is no bpf hook to it.
      
      This patch adds bpf support for these syscalls tracepoints by
        . permitting bpf attachment in ioctl PERF_EVENT_IOC_SET_BPF
        . calling bpf programs in perf_syscall_enter and perf_syscall_exit
      
      The legality of bpf program ctx access is also checked.
      Function trace_event_get_offsets returns correct max offset for each
      specific syscall tracepoint, which is compared against the maximum offset
      access in bpf program.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf5f5cea
  13. 02 8月, 2017 1 次提交
    • V
      x86/perf/cqm: Wipe out perf based cqm · c39a0e2c
      Vikas Shivappa 提交于
      'perf cqm' never worked due to the incompatibility between perf
      infrastructure and cqm hardware support.  The hardware uses RMIDs to
      track the llc occupancy of tasks and these RMIDs are per package. This
      makes monitoring a hierarchy like cgroup along with monitoring of tasks
      separately difficult and several patches sent to lkml to fix them were
      NACKed. Further more, the following issues in the current perf cqm make
      it almost unusable:
      
          1. No support to monitor the same group of tasks for which we do
          allocation using resctrl.
      
          2. It gives random and inaccurate data (mostly 0s) once we run out
          of RMIDs due to issues in Recycling.
      
          3. Recycling results in inaccuracy of data because we cannot
          guarantee that the RMID was stolen from a task when it was not
          pulling data into cache or even when it pulled the least data. Also
          for monitoring llc_occupancy, if we stop using an RMID_x and then
          start using an RMID_y after we reclaim an RMID from an other event,
          we miss accounting all the occupancy that was tagged to RMID_x at a
          later perf_count.
      
          2. Recycling code makes the monitoring code complex including
          scheduling because the event can lose RMID any time. Since MBM
          counters count bandwidth for a period of time by taking snap shot of
          total bytes at two different times, recycling complicates the way we
          count MBM in a hierarchy. Also we need a spin lock while we do the
          processing to account for MBM counter overflow. We also currently
          use a spin lock in scheduling to prevent the RMID from being taken
          away.
      
          4. Lack of support when we run different kind of event like task,
          system-wide and cgroup events together. Data mostly prints 0s. This
          is also because we can have only one RMID tied to a cpu as defined
          by the cqm hardware but a perf can at the same time tie multiple
          events during one sched_in.
      
          5. No support of monitoring a group of tasks. There is partial support
          for cgroup but it does not work once there is a hierarchy of cgroups
          or if we want to monitor a task in a cgroup and the cgroup itself.
      
          6. No support for monitoring tasks for the lifetime without perf
          overhead.
      
          7. It reported the aggregate cache occupancy or memory bandwidth over
          all sockets. But most cloud and VMM based use cases want to know the
          individual per-socket usage.
      Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ravi.v.shankar@intel.com
      Cc: tony.luck@intel.com
      Cc: fenghua.yu@intel.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: vikas.shivappa@intel.com
      Cc: ak@linux.intel.com
      Cc: davidcc@google.com
      Cc: reinette.chatre@intel.com
      Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com
      c39a0e2c
  14. 21 7月, 2017 2 次提交
    • T
      cgroup: implement cgroup v2 thread support · 8cfd8147
      Tejun Heo 提交于
      This patch implements cgroup v2 thread support.  The goal of the
      thread mode is supporting hierarchical accounting and control at
      thread granularity while staying inside the resource domain model
      which allows coordination across different resource controllers and
      handling of anonymous resource consumptions.
      
      A cgroup is always created as a domain and can be made threaded by
      writing to the "cgroup.type" file.  When a cgroup becomes threaded, it
      becomes a member of a threaded subtree which is anchored at the
      closest ancestor which isn't threaded.
      
      The threads of the processes which are in a threaded subtree can be
      placed anywhere without being restricted by process granularity or
      no-internal-process constraint.  Note that the threads aren't allowed
      to escape to a different threaded subtree.  To be used inside a
      threaded subtree, a controller should explicitly support threaded mode
      and be able to handle internal competition in the way which is
      appropriate for the resource.
      
      The root of a threaded subtree, the nearest ancestor which isn't
      threaded, is called the threaded domain and serves as the resource
      domain for the whole subtree.  This is the last cgroup where domain
      controllers are operational and where all the domain-level resource
      consumptions in the subtree are accounted.  This allows threaded
      controllers to operate at thread granularity when requested while
      staying inside the scope of system-level resource distribution.
      
      As the root cgroup is exempt from the no-internal-process constraint,
      it can serve as both a threaded domain and a parent to normal cgroups,
      so, unlike non-root cgroups, the root cgroup can have both domain and
      threaded children.
      
      Internally, in a threaded subtree, each css_set has its ->dom_cset
      pointing to a matching css_set which belongs to the threaded domain.
      This ensures that thread root level cgroup_subsys_state for all
      threaded controllers are readily accessible for domain-level
      operations.
      
      This patch enables threaded mode for the pids and perf_events
      controllers.  Neither has to worry about domain-level resource
      consumptions and it's enough to simply set the flag.
      
      For more details on the interface and behavior of the thread mode,
      please refer to the section 2-2-2 in Documentation/cgroup-v2.txt added
      by this patch.
      
      v5: - Dropped silly no-op ->dom_cgrp init from cgroup_create().
            Spotted by Waiman.
          - Documentation updated as suggested by Waiman.
          - cgroup.type content slightly reformatted.
          - Mark the debug controller threaded.
      
      v4: - Updated to the general idea of marking specific cgroups
            domain/threaded as suggested by PeterZ.
      
      v3: - Dropped "join" and always make mixed children join the parent's
            threaded subtree.
      
      v2: - After discussions with Waiman, support for mixed thread mode is
            added.  This should address the issue that Peter pointed out
            where any nesting should be avoided for thread subtrees while
            coexisting with other domain cgroups.
          - Enabling / disabling thread mode now piggy backs on the existing
            control mask update mechanism.
          - Bug fixes and cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Waiman Long <longman@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      8cfd8147
    • J
      perf/core: Fix locking for children siblings group read · 2aeb1883
      Jiri Olsa 提交于
      We're missing ctx lock when iterating children siblings
      within the perf_read path for group reading. Following
      race and crash can happen:
      
      User space doing read syscall on event group leader:
      
      T1:
        perf_read
          lock event->ctx->mutex
          perf_read_group
            lock leader->child_mutex
            __perf_read_group_add(child)
              list_for_each_entry(sub, &leader->sibling_list, group_entry)
      
      ---->   sub might be invalid at this point, because it could
              get removed via perf_event_exit_task_context in T2
      
      Child exiting and cleaning up its events:
      
      T2:
        perf_event_exit_task_context
          lock ctx->mutex
          list_for_each_entry_safe(child_event, next, &child_ctx->event_list,...
            perf_event_exit_event(child)
              lock ctx->lock
              perf_group_detach(child)
              unlock ctx->lock
      
      ---->   child is removed from sibling_list without any sync
              with T1 path above
      
              ...
              free_event(child)
      
      Before the child is removed from the leader's child_list,
      (and thus is omitted from perf_read_group processing), we
      need to ensure that perf_read_group touches child's
      siblings under its ctx->lock.
      
      Peter further notes:
      
      | One additional note; this bug got exposed by commit:
      |
      |   ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      |
      | which made it possible to actually trigger this code-path.
      Tested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ba5213ae ("perf/core: Correct event creation with PERF_FORMAT_GROUP")
      Link: http://lkml.kernel.org/r/20170720141455.2106-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2aeb1883
  15. 20 7月, 2017 1 次提交
    • A
      perf/core: Fix scheduling regression of pinned groups · 3bda69c1
      Alexander Shishkin 提交于
      Vince Weaver reported:
      
      > I was tracking down some regressions in my perf_event_test testsuite.
      > Some of the tests broke in the 4.11-rc1 timeframe.
      >
      > I've bisected one of them, this report is about
      >	tests/overflow/simul_oneshot_group_overflow
      > This test creates an event group containing two sampling events, set
      > to overflow to a signal handler (which disables and then refreshes the
      > event).
      >
      > On a good kernel you get the following:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 946 (perf::instructions/1000000)
      > 		fd 4 overflows: 473 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946379875
      > 		Count 1: 946365218
      >
      > With the broken kernels you get:
      > 	Event perf::instructions with period 1000000
      > 	Event perf::instructions with period 2000000
      > 		fd 3 overflows: 938 (perf::instructions/1000000)
      > 		fd 4 overflows: 318 (perf::instructions/2000000)
      > 	Ending counts:
      > 		Count 0: 946373080
      > 		Count 1: 653373058
      
      The root cause of the bug is that the following commit:
      
        487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      
      erronously assumed that event's 'pinned' setting determines whether the
      event belongs to a pinned group or not, but in fact, it's the group
      leader's pinned state that matters.
      
      This was discovered by Vince in the test case described above, where two instruction
      counters are grouped, the group leader is pinned, but the other event is not;
      in the regressed case the counters were off by 33% (the difference between events'
      periods), but should be the same within the error margin.
      
      Fix the problem by looking at the group leader's pinning.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 487f05e1 ("perf/core: Optimize event rescheduling on active contexts")
      Link: http://lkml.kernel.org/r/87lgnmvw7h.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3bda69c1
  16. 11 7月, 2017 1 次提交
  17. 21 6月, 2017 1 次提交
  18. 08 6月, 2017 3 次提交
    • M
      perf/core: Remove unused perf_cgroup_event_cgrp_time() function · d0fabd1c
      Matthias Kaehlcke 提交于
      The function was added by commit e5d1367f ("perf: Add cgroup
      support") in 2011 and hasn't been used since then. Removing it fixes the
      following warning when building with Clang:
      
          kernel/events/core.c:696:19: error: unused function 'perf_cgroup_event_cgrp_time' [-Werror,-Wunused-function]
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Douglas Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170523215132.189049-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d0fabd1c
    • P
      perf/core: Correct event creation with PERF_FORMAT_GROUP · ba5213ae
      Peter Zijlstra 提交于
      Andi was asking about PERF_FORMAT_GROUP vs inherited events, which led
      to the discovery of a bug from commit:
      
        3dab77fb ("perf: Rework/fix the whole read vs group stuff")
      
       -       PERF_SAMPLE_GROUP                       = 1U << 4,
       +       PERF_SAMPLE_READ                        = 1U << 4,
      
       -       if (attr->inherit && (attr->sample_type & PERF_SAMPLE_GROUP))
       +       if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
      
      is a clear fail :/
      
      While this changes user visible behaviour; it was previously possible
      to create an inherited event with PERF_SAMPLE_READ; this is deemed
      acceptible because its results were always incorrect.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Fixes:  3dab77fb ("perf: Rework/fix the whole read vs group stuff")
      Link: http://lkml.kernel.org/r/20170530094512.dy2nljns2uq7qa3j@hirez.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ba5213ae
    • J
      perf/core: Drop kernel samples even though :u is specified · cc1582c2
      Jin Yao 提交于
      When doing sampling, for example:
      
        perf record -e cycles:u ...
      
      On workloads that do a lot of kernel entry/exits we see kernel
      samples, even though :u is specified. This is due to skid existing.
      
      This might be a security issue because it can leak kernel addresses even
      though kernel sampling support is disabled.
      
      The patch drops the kernel samples if exclude_kernel is specified.
      
      For example, test on Haswell desktop:
      
        perf record -e cycles:u <mgen>
        perf report --stdio
      
      Before patch applied:
      
          99.77%  mgen     mgen              [.] buf_read
           0.20%  mgen     mgen              [.] rand_buf_init
           0.01%  mgen     [kernel.vmlinux]  [k] apic_timer_interrupt
           0.00%  mgen     mgen              [.] last_free_elem
           0.00%  mgen     libc-2.23.so      [.] __random_r
           0.00%  mgen     libc-2.23.so      [.] _int_malloc
           0.00%  mgen     mgen              [.] rand_array_init
           0.00%  mgen     [kernel.vmlinux]  [k] page_fault
           0.00%  mgen     libc-2.23.so      [.] __random
           0.00%  mgen     libc-2.23.so      [.] __strcasestr
           0.00%  mgen     ld-2.23.so        [.] strcmp
           0.00%  mgen     ld-2.23.so        [.] _dl_start
           0.00%  mgen     libc-2.23.so      [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so        [.] _start
      
      We can see kernel symbols apic_timer_interrupt and page_fault.
      
      After patch applied:
      
          99.79%  mgen     mgen           [.] buf_read
           0.19%  mgen     mgen           [.] rand_buf_init
           0.00%  mgen     libc-2.23.so   [.] __random_r
           0.00%  mgen     mgen           [.] rand_array_init
           0.00%  mgen     mgen           [.] last_free_elem
           0.00%  mgen     libc-2.23.so   [.] vfprintf
           0.00%  mgen     libc-2.23.so   [.] rand
           0.00%  mgen     libc-2.23.so   [.] __random
           0.00%  mgen     libc-2.23.so   [.] _int_malloc
           0.00%  mgen     libc-2.23.so   [.] _IO_doallocbuf
           0.00%  mgen     ld-2.23.so     [.] do_lookup_x
           0.00%  mgen     ld-2.23.so     [.] open_verify.constprop.7
           0.00%  mgen     ld-2.23.so     [.] _dl_important_hwcaps
           0.00%  mgen     libc-2.23.so   [.] sched_setaffinity@@GLIBC_2.3.4
           0.00%  mgen     ld-2.23.so     [.] _start
      
      There are only userspace symbols.
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: acme@kernel.org
      Cc: jolsa@kernel.org
      Cc: kan.liang@intel.com
      Cc: mark.rutland@arm.com
      Cc: will.deacon@arm.com
      Cc: yao.jin@intel.com
      Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cc1582c2