1. 16 10月, 2018 1 次提交
  2. 09 8月, 2018 1 次提交
    • K
      perf map: Synthesize maps only for thread group leader · e5adfc3e
      Konstantin Khlebnikov 提交于
      Threads share map_groups, all map events are merged into it.
      
      Thus we could send mmaps only for thread group leader.  Otherwise it
      took ages to attach and record something from processes with many vmas
      and threads.
      
      Thread group leader could be already dead, but it seems perf cannot
      handle this case anyway.
      
      Testing dummy:
      
        #include <stdio.h>
        #include <stdlib.h>
        #include <sys/mman.h>
        #include <pthread.h>
        #include <unistd.h>
      
        void *thread(void *arg) {
                pause();
        }
      
        int main(int argc, char **argv) {
              int threads = 10000;
              int vmas = 50000;
              pthread_t th;
              for (int i = 0; i < threads; i++)
                      pthread_create(&th, NULL, thread, NULL);
              for (int i = 0; i < vmas; i++)
                      mmap(NULL, 4096, (i & 1) ? PROT_READ : PROT_WRITE,
                           MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE, -1, 0);
              sleep(60);
              return 0;
        }
      
      Comment by Jiri Olsa:
      
      We actualy synthesize the group leader (if we found one) for the thread
      even if it's not present in the thread_map, so the process maps are
      always in data.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/153363294102.396323.6277944760215058174.stgit@buzzSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e5adfc3e
  3. 23 5月, 2018 1 次提交
  4. 22 5月, 2018 1 次提交
  5. 27 4月, 2018 10 次提交
  6. 17 4月, 2018 1 次提交
  7. 17 2月, 2018 2 次提交
  8. 10 1月, 2018 1 次提交
  9. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  10. 25 10月, 2017 1 次提交
    • M
      perf report: Use srcline from callchain for hist entries · 1fb7d06a
      Milian Wolff 提交于
      This also removes the symbol name from the srcline column, more on this
      below.
      
      This ensures we use the correct srcline, which could originate from a
      potentially inlined function. The hist entries used to query for the
      srcline based purely on the IP, which leads to wrong results for inlined
      entries.
      
      Before:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ..................................................................................................................................
        #
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            44.58%     0.00%  main+100
            44.58%     0.00%  std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  std::__complex_abs+100
            44.58%     0.00%  std::abs<double>+100
            44.58%     0.00%  std::norm<double>+100
            36.01%     0.00%  hypot+18446603487892193300
            25.81%     0.00%  main+41
            25.81%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.75%    25.75%  random.h:143
            18.39%     0.00%  main+57
            18.39%     0.00%  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  random.tcc:3330
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      After:
      
      ~~~~~
        perf report --inline -s srcline -g none --stdio
        ...
        # Children      Self  Source:Line
        # ........  ........  ...........................................
        #
            94.30%     1.19%  main.cpp:39
            94.23%     0.00%  __libc_start_main+18446603487898210537
            94.23%     0.00%  _start+41
            48.44%     1.70%  random.h:1823
            48.44%     0.00%  random.h:1814
            46.74%     2.53%  random.h:185
            44.68%     0.10%  complex:589
            44.68%     0.00%  complex:597
            44.68%     0.00%  complex:654
            44.68%     0.00%  complex:664
            40.61%    13.80%  random.tcc:3330
            36.01%     0.00%  hypot+18446603487892193300
            26.81%     0.00%  random.h:151
            26.81%     0.00%  random.h:332
            25.75%    25.75%  random.h:143
             5.64%     0.00%  ??:0
             4.13%     4.13%  __hypot_finite+163
             4.13%     0.00%  __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Note that this change removes the symbol from the source:line hist
      column. If this information is desired, users should explicitly query
      for it if needed. I.e. run this command instead:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ...........................................
        #
            94.30%     1.19%  [.] main                                                                                                                             main.cpp:39
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1814
            48.44%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  random.h:1823
            46.74%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  random.h:185
            44.68%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              complex:654
            44.68%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     complex:589
            44.68%     0.00%  [.] std::abs<double> (inlined)                                                                                                       complex:597
            44.68%     0.00%  [.] std::norm<double> (inlined)                                                                                                      complex:664
            39.80%    13.59%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            26.81%     0.00%  [.] std::__detail::__mod<unsigned long, 2147483647ul, 16807ul, 0ul> (inlined)                                                        random.h:151
            26.81%     0.00%  [.] std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>::operator() (inlined)                                 random.h:332
            25.75%     0.00%  [.] std::__detail::_Mod<unsigned long, 2147483647ul, 16807ul, 0ul, true, true>::__calc (inlined)                                     random.h:143
            25.19%    25.19%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      
      Compared to the old behavior, this reduces duplication in the output.
      Before we used to print the symbol name in the srcline column even
      when the sym column was explicitly requested. I.e. the output was:
      
      ~~~~~
        perf report --inline -s sym,srcline -g none --stdio
        ...
        # To display the perf.data header info, please use --header/--header-only options.
        #
        #
        # Total Lost Samples: 0
        #
        # Samples: 1K of event 'cycles:uppp'
        # Event count (approx.): 1381229476
        #
        # Children      Self  Symbol                                                                                                                               Source:Line
        # ........  ........  ...................................................................................................................................  ..................................................................................................................................
        #
            94.23%     0.00%  [.] __libc_start_main                                                                                                                __libc_start_main+18446603487898210537
            94.23%     0.00%  [.] _start                                                                                                                           _start+41
            44.58%     0.00%  [.] main                                                                                                                             main+100
            44.58%     0.00%  [.] std::_Norm_helper<true>::_S_do_it<double> (inlined)                                                                              std::_Norm_helper<true>::_S_do_it<double>+100
            44.58%     0.00%  [.] std::__complex_abs (inlined)                                                                                                     std::__complex_abs+100
            44.58%     0.00%  [.] std::abs<double> (inlined)                                                                                                       std::abs<double>+100
            44.58%     0.00%  [.] std::norm<double> (inlined)                                                                                                      std::norm<double>+100
            36.01%     0.00%  [.] hypot                                                                                                                            hypot+18446603487892193300
            25.81%     0.00%  [.] main                                                                                                                             main+41
            25.81%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+41
            25.81%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+41
            25.69%    25.69%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.h:143
            18.39%     0.00%  [.] main                                                                                                                             main+57
            18.39%     0.00%  [.] std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator() (inlined)  std::__detail::_Adaptor<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul>, double>::operator()+57
            18.39%     0.00%  [.] std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> > (inlined)  std::uniform_real_distribution<double>::operator()<std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >+57
            13.80%    13.80%  [.] std::generate_canonical<double, 53ul, std::linear_congruential_engine<unsigned long, 16807ul, 0ul, 2147483647ul> >               random.tcc:3330
             4.13%     4.13%  [.] __hypot_finite                                                                                                                   __hypot_finite+163
             4.13%     0.00%  [.] __hypot_finite                                                                                                                   __hypot_finite+18446603487892193443
      ...
      ~~~~~
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20171019113836.5548-5-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fb7d06a
  11. 03 10月, 2017 2 次提交
    • K
      perf top: Add option to set the number of thread for event synthesize · 0c6b4994
      Kan Liang 提交于
      Using UINT_MAX to indicate the default thread#, which is the max number
      of online CPU.
      
      Committer testing:
      
        # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9
        # cat /tmp/output
               ? (     ?   ):  ... [continued]: clone()) = 26651 (perf)
           0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf)
           0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf)
           0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf)
           0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf)
           0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf)
           0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf)
           0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf)
           1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf)
         246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0c6b4994
    • K
      perf top: Implement multithreading for perf_event__synthesize_threads · 340b47f5
      Kan Liang 提交于
      The proc files which is sorted with alphabetical order are evenly
      assigned to several synthesize threads to be processed in parallel.
      
      For 'perf top', the threads number hard code to online CPU number. The
      following patch will introduce an option to set it.
      
      For other perf tools, the thread number is 1. Because the process
      function is not ready for multithreading, e.g.
      process_synthesized_event.
      
      This patch series only support event synthesize multithreading for 'perf
      top'. For other tools, it can be done separately later.
      
      With multithread applied, the total processing time can get up to 1.56x
      speedup on Knights Mill for 'perf top'.
      
      For specific single event processing, the processing time could increase
      because of the lock contention. So proc_map_timeout may need to be
      increased. Otherwise some proc maps will be truncated.
      
      Based on my test, increasing the proc_map_timeout has small impact
      on the total processing time. The total processing time still get 1.49x
      speedup on Knights Mill after increasing the proc_map_timeout.
      The patch itself doesn't increase the proc_map_timeout.
      
      Doesn't need to implement multithreading for per task monitoring,
      perf_event__synthesize_thread_map. It doesn't have performance issue.
      
      Committer testing:
      
        # getconf _NPROCESSORS_ONLN
        4
        # perf trace --no-inherit -e clone -o /tmp/output perf top
        # tail -4 /tmp/bla
           0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
           0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
           0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
         246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      340b47f5
  12. 13 9月, 2017 2 次提交
  13. 19 7月, 2017 1 次提交
    • D
      perf tools: Add feature header record to pipe-mode · e9def1b2
      David Carrillo-Cisneros 提交于
      Add header record types to pipe-mode, reusing the functions
      used in file-mode and leveraging the new struct feat_fd.
      
      For alignment, check that synthesized events don't exceed
      pagesize.
      
      Add the perf_event__synthesize_feature event call back to
      process the new header records.
      
      Before this patch:
      
        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        ...
      
      After this patch:
        $ perf record -o - -e cycles sleep 1 | perf report --stdio --header
        # ========
        # captured on: Mon May 22 16:33:43 2017
        # ========
        #
        # hostname : my_hostname
        # os release : 4.11.0-dbx-up_perf
        # perf version : 4.11.rc6.g6277c80
        # arch : x86_64
        # nrcpus online : 72
        # nrcpus avail : 72
        # cpudesc : Intel(R) Xeon(R) CPU E5-2696 v3 @ 2.30GHz
        # cpuid : GenuineIntel,6,63,2
        # total memory : 263457192 kB
        # cmdline : /root/perf record -o - -e cycles -c 100000 sleep 1
        # HEADER_CPU_TOPOLOGY info available, use -I to display
        # HEADER_NUMA_TOPOLOGY info available, use -I to display
        # pmu mappings: intel_bts = 6, uncore_imc_4 = 22, uncore_sbox_1 = 47, uncore_cbox_5 = 33, uncore_ha_0 = 16, uncore_cbox
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 0.000 MB - ]
        ...
      
      Support added for the subcommands: report, inject, annotate and script.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Simon Que <sque@chromium.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170718042549.145161-16-davidcc@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9def1b2
  14. 03 5月, 2017 1 次提交
    • A
      perf symbols: Accept symbols starting at address 0 · b843f62a
      Arnaldo Carvalho de Melo 提交于
      That is the case of _text on s390, and we have some functions that return an
      address, using address zero to report problems, oops.
      
      This would lead the symbol loading routines to not use "_text" as the reference
      relocation symbol, or the first symbol for the kernel, but use instead
      "_stext", that is at the same address on x86_64 and others, but not on s390:
      
        [acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
        0000000000000000 T _text
        0000000000000418 t iplstart
        0000000000000800 T start
        000000000000080a t .base
        000000000000082e t .sk8x8
        0000000000000834 t .gotr
        0000000000000842 t .cmd
        0000000000000846 t .parm
        000000000000084a t .lowcase
        0000000000010000 T startup
        0000000000010010 T startup_kdump
        0000000000010214 t startup_kdump_relocated
        0000000000011000 T startup_continue
        00000000000112a0 T _ehead
        0000000000100000 T _stext
        [acme@localhost perf-4.11.0-rc6]$
      
      Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
      the symbols before "_stext" in kallsyms.
      
      Fix it by using the return value only for errors and storing the
      address, when the symbol is successfully found, in a provided pointer
      arg.
      
      Before this patch:
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 40693
        Looking at the vmlinux_path (8 entries long)
        Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
        ERR : 0: _text not on kallsyms
        ERR : 0x418: iplstart not on kallsyms
        ERR : 0x800: start not on kallsyms
        ERR : 0x80a: .base not on kallsyms
        ERR : 0x82e: .sk8x8 not on kallsyms
        ERR : 0x834: .gotr not on kallsyms
        ERR : 0x842: .cmd not on kallsyms
        ERR : 0x846: .parm not on kallsyms
        ERR : 0x84a: .lowcase not on kallsyms
        ERR : 0x10000: startup not on kallsyms
        ERR : 0x10010: startup_kdump not on kallsyms
        ERR : 0x10214: startup_kdump_relocated not on kallsyms
        ERR : 0x11000: startup_continue not on kallsyms
        ERR : 0x112a0: _ehead not on kallsyms
        <SNIP warnings>
        test child finished with -1
        ---- end ----
        vmlinux symtab matches kallsyms: FAILED!
        [acme@localhost perf-4.11.0-rc6]$
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 47160
        <SNIP warnings>
        test child finished with 0
        ---- end ----
        vmlinux symtab matches kallsyms: Ok
        [acme@localhost perf-4.11.0-rc6]$
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b843f62a
  15. 26 4月, 2017 1 次提交
  16. 25 4月, 2017 2 次提交
  17. 20 4月, 2017 6 次提交
  18. 13 4月, 2017 1 次提交
  19. 12 4月, 2017 1 次提交
  20. 17 3月, 2017 1 次提交
    • A
      perf tools: Handle partial AUX records and print a warning · 05a1f47e
      Alexander Shishkin 提交于
      This patch decodes the 'partial' flag in AUX records and prints
      a warning to the user, so that they don't have to guess why their
      PT traces contain gaps (or missing altogether):
      
        Warning:
        AUX data had gaps in it 8 times out of 8!
      
        Are you running a KVM guest in the background?
      
      Trying to be even more helpful, we will detect if the user's kvm driver sets up
      exclusive VMX root mode for the entire lifespan of the kvm process:
      
        Reloading kvm_intel module with vmm_exclusive=0
        will reduce the gaps to only guest's timeslices.
      
      Note however, that you'll still have gaps in cpu-wide traces even with
      vmm_exclusive=0, but the number of gaps will be below 100% (as opposed to the
      above example).
      
      Currently this is the only reason for partial records.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vince@deater.net>
      Link: http://lkml.kernel.org/r/8760j941ig.fsf@ashishki-desk.ger.corp.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      05a1f47e
  21. 16 3月, 2017 1 次提交
    • S
      perf tools: Make perf_event__synthesize_mmap_events() scale · 88b897a3
      Stephane Eranian 提交于
      This patch significantly improves the execution time of
      perf_event__synthesize_mmap_events() when running perf record on systems
      where processes have lots of threads.
      
      It just happens that cat /proc/pid/maps support uses a O(N^2) algorithm to
      generate each map line in the maps file.  If you have 1000 threads, then you
      have necessarily 1000 stacks.  For each vma, you need to check if it
      corresponds to a thread's stack.  With a large number of threads, this can take
      a very long time. I have seen latencies >> 10mn.
      
      As of today, perf does not use the fact that a mapping is a stack, therefore we
      can work around the issue by using /proc/pid/tasks/pid/maps.  This entry does
      not try to map a vma to stack and is thus much faster with no loss of
      functonality.
      
      The proc-map-timeout logic is kept in case users still want some upper limit.
      
      In V2, we fix the file path from /proc/pid/tasks/pid/maps to actual
      /proc/pid/task/pid/maps, tasks -> task.  Thanks Arnaldo for catching this.
      
      Committer note:
      
      This problem seems to have been elliminated in the kernel since commit :
      b18cb64e ("fs/proc: Stop trying to report thread stacks").
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20170315135059.GC2177@redhat.com
      Link: http://lkml.kernel.org/r/1489598233-25586-1-git-send-email-eranian@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      88b897a3
  22. 15 3月, 2017 1 次提交
    • H
      perf record: Synthesize namespace events for current processes · e907caf3
      Hari Bathini 提交于
      Synthesize PERF_RECORD_NAMESPACES events for processes that were running prior
      to invocation of perf record. The data for this is taken from /proc/$PID/ns.
      These changes make way for analyzing events with regard to namespaces.
      
      Committer notes:
      
      Check if 'tool' is NULL in perf_event__synthesize_namespaces(), as in the
      test__mmap_thread_lookup case, i.e. 'perf test Lookup mmap thread".
      
      Testing it:
      
        # ps axH > /tmp/allthreads
        # perf record -a --namespaces usleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.169 MB perf.data (8 samples) ]
        # perf report -D | grep PERF_RECORD_NAMESPACES | wc -l
        602
        # wc -l /tmp/allthreads
        601 /tmp/allthreads
        # tail /tmp/allthreads
        16951 pts/4    T      0:00 git rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^
        16952 pts/4    T      0:00 /bin/sh /usr/libexec/git-core/git-rebase -i a033bf1bfacdaa25642e6bcc857a7d0f67cc3c92^
        17176 pts/4    T      0:00 git commit --amend --no-post-rewrite
        17204 pts/4    T      0:00 vim /home/acme/git/linux/.git/COMMIT_EDITMSG
        18939 ?        S      0:00 [kworker/2:1]
        18947 ?        S      0:00 [kworker/3:0]
        18974 ?        S      0:00 [kworker/1:0]
        19047 ?        S      0:00 [kworker/0:1]
        19152 pts/6    S+     0:00 weechat
        19153 pts/7    R+     0:00 ps axH
        # perf report -D | grep PERF_RECORD_NAMESPACES | tail
        0 0 0x125068 [0xa0]: PERF_RECORD_NAMESPACES 17176/17176 - nr_namespaces: 7
        0 0 0x1255b8 [0xa0]: PERF_RECORD_NAMESPACES 17204/17204 - nr_namespaces: 7
        0 0 0x125df0 [0xa0]: PERF_RECORD_NAMESPACES 18939/18939 - nr_namespaces: 7
        0 0 0x125f00 [0xa0]: PERF_RECORD_NAMESPACES 18947/18947 - nr_namespaces: 7
        0 0 0x126010 [0xa0]: PERF_RECORD_NAMESPACES 18974/18974 - nr_namespaces: 7
        0 0 0x126120 [0xa0]: PERF_RECORD_NAMESPACES 19047/19047 - nr_namespaces: 7
        0 0 0x126230 [0xa0]: PERF_RECORD_NAMESPACES 19152/19152 - nr_namespaces: 7
        0 0 0x129330 [0xa0]: PERF_RECORD_NAMESPACES 19154/19154 - nr_namespaces: 7
        0 0 0x12a1f8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
        0 0 0x12b0b8 [0xa0]: PERF_RECORD_NAMESPACES 19155/19155 - nr_namespaces: 7
        #
      
      Humm, investigate why we got two record for the 19155 pid/tid...
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/148891931111.25309.11073854609798681633.stgit@hbathini.in.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e907caf3