1. 29 11月, 2017 2 次提交
  2. 03 10月, 2017 2 次提交
    • K
      perf top: Add option to set the number of thread for event synthesize · 0c6b4994
      Kan Liang 提交于
      Using UINT_MAX to indicate the default thread#, which is the max number
      of online CPU.
      
      Committer testing:
      
        # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9
        # cat /tmp/output
               ? (     ?   ):  ... [continued]: clone()) = 26651 (perf)
           0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf)
           0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf)
           0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf)
           0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf)
           0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf)
           0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf)
           0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf)
           1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf)
         246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0c6b4994
    • K
      perf top: Implement multithreading for perf_event__synthesize_threads · 340b47f5
      Kan Liang 提交于
      The proc files which is sorted with alphabetical order are evenly
      assigned to several synthesize threads to be processed in parallel.
      
      For 'perf top', the threads number hard code to online CPU number. The
      following patch will introduce an option to set it.
      
      For other perf tools, the thread number is 1. Because the process
      function is not ready for multithreading, e.g.
      process_synthesized_event.
      
      This patch series only support event synthesize multithreading for 'perf
      top'. For other tools, it can be done separately later.
      
      With multithread applied, the total processing time can get up to 1.56x
      speedup on Knights Mill for 'perf top'.
      
      For specific single event processing, the processing time could increase
      because of the lock contention. So proc_map_timeout may need to be
      increased. Otherwise some proc maps will be truncated.
      
      Based on my test, increasing the proc_map_timeout has small impact
      on the total processing time. The total processing time still get 1.49x
      speedup on Knights Mill after increasing the proc_map_timeout.
      The patch itself doesn't increase the proc_map_timeout.
      
      Doesn't need to implement multithreading for per task monitoring,
      perf_event__synthesize_thread_map. It doesn't have performance issue.
      
      Committer testing:
      
        # getconf _NPROCESSORS_ONLN
        4
        # perf trace --no-inherit -e clone -o /tmp/output perf top
        # tail -4 /tmp/bla
           0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
           0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
           0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
         246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      340b47f5
  3. 26 7月, 2017 1 次提交
  4. 21 7月, 2017 1 次提交
  5. 19 7月, 2017 1 次提交
    • J
      perf annotate: Check for fused instructions · 69fb09f6
      Jin Yao 提交于
      Macro fusion merges two instructions to a single micro-op. Intel core
      platform performs this hardware optimization under limited
      circumstances.
      
      For example, CMP + JCC can be "fused" and executed /retired together.
      While with sampling this can result in the sample sometimes being on the
      JCC and sometimes on the CMP.  So for the fused instruction pair, they
      could be considered together.
      
      On Nehalem, fused instruction pairs:
      
        cmp/test + jcc.
      
      On other new CPU:
      
        cmp/test/add/sub/and/inc/dec + jcc.
      
      This patch adds an x86-specific function which checks if 2 instructions
      are in a "fused" pair. For non-x86 arch, the function is just NULL.
      
      Changelog:
      
      v4: Move the CPU model checking to symbol__disassemble and save the CPU
          family/model in arch structure.
      
          It avoids checking every time when jump arrow printed.
      
      v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs
          just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC).
      
      v2: Remove the original weak function. Arnaldo points out that doing it
          as a weak function that will be overridden by the host arch doesn't
          work. So now it's implemented as an arch-specific function.
      
      Committer fix:
      
      Do not access evsel->evlist->env->cpuid, ->env can be null, introduce
      perf_evsel__env_cpuid(), just like perf_evsel__env_arch(), also used in
      this function call.
      
      The original patch was segfaulting 'perf top' + annotation.
      
      But this essentially disables this fused instructions augmentation in
      'perf top', the right thing is to get the cpuid from the running kernel,
      left for a later patch tho.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1499403995-19857-2-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      69fb09f6
  6. 27 6月, 2017 1 次提交
  7. 20 6月, 2017 1 次提交
  8. 26 4月, 2017 1 次提交
  9. 21 4月, 2017 1 次提交
  10. 20 4月, 2017 2 次提交
  11. 27 3月, 2017 1 次提交
  12. 20 2月, 2017 1 次提交
  13. 14 2月, 2017 1 次提交
  14. 09 2月, 2017 1 次提交
  15. 27 1月, 2017 1 次提交
    • A
      perf tools: Propagate perf_config() errors · ecc4c561
      Arnaldo Carvalho de Melo 提交于
      Previously these were being ignored, sometimes silently.
      
      Stop doing that, emitting debug messages and handling the errors.
      
      Testing it:
      
        $ cat ~/.perfconfig
        cat: /home/acme/.perfconfig: No such file or directory
        $ perf stat -e cycles usleep 1
      
         Performance counter stats for 'usleep 1':
      
                 938,996      cycles:u
      
             0.003813731 seconds time elapsed
      
        $ perf top --stdio
        Error:
        You may not have permission to collect system-wide stats.
      
        Consider tweaking /proc/sys/kernel/perf_event_paranoid,
        <SNIP>
        [ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ]
        [acme@jouet linux]$ perf report --stdio
        # To display the perf.data header info, please use --header/--header-only options.
        # Overhead  Command  Shared Object      Symbol
        # ........  .......  .................  .........................
          71.77%  usleep   libc-2.24.so       [.] _dl_addr
          27.07%  usleep   ld-2.24.so         [.] _dl_next_ld_env_entry
           1.13%  usleep   [kernel.kallsyms]  [k] page_fault
        $
        $ touch ~/.perfconfig
        $ ls -la ~/.perfconfig
        -rw-rw-r--. 1 acme acme 0 Jan 27 12:14 /home/acme/.perfconfig
        $
        $ perf stat -e instructions usleep 1
      
         Performance counter stats for 'usleep 1':
      
                 244,610      instructions:u
      
             0.000805383 seconds time elapsed
      
        $
        [root@jouet ~]# chown acme.acme ~/.perfconfig
        [root@jouet ~]# perf stat -e cycles usleep 1
          Warning: File /root/.perfconfig not owned by current user or root, ignoring it.
      
         Performance counter stats for 'usleep 1':
      
                 937,615      cycles
      
             0.000836931 seconds time elapsed
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-j2rq96so6xdqlr8p8rd6a3jx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ecc4c561
  16. 18 11月, 2016 1 次提交
    • A
      perf annotate: Start supporting cross arch annotation · 786c1b51
      Arnaldo Carvalho de Melo 提交于
      Introduce a 'struct arch', where arch specific stuff will live, starting
      with objdump's choice of comment delimitation character, that is '#' in
      x86 while a ';' in arm.
      
      This has some bits and pieces from a patch submitted by Ravi.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Chris Riyder <chris.ryder@arm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Pawel Moll <pawel.moll@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Cc: Taeung Song <treeze.taeung@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-f337tzjjcl8vtapgvjxmhrbx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      786c1b51
  17. 23 9月, 2016 1 次提交
  18. 05 9月, 2016 2 次提交
  19. 30 8月, 2016 2 次提交
  20. 24 8月, 2016 1 次提交
  21. 02 8月, 2016 2 次提交
  22. 13 7月, 2016 1 次提交
    • A
      tools: Introduce str_error_r() · c8b5f2c9
      Arnaldo Carvalho de Melo 提交于
      The tools so far have been using the strerror_r() GNU variant, that
      returns a string, be it the buffer passed or something else.
      
      But that, besides being tricky in cases where we expect that the
      function using strerror_r() returns the error formatted in a provided
      buffer (we have to check if it returned something else and copy that
      instead), breaks the build on systems not using glibc, like Alpine
      Linux, where musl libc is used.
      
      So, introduce yet another wrapper, str_error_r(), that has the GNU
      interface, but uses the portable XSI variant of strerror_r(), so that
      users rest asured that the provided buffer is used and it is what is
      returned.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-d4t42fnf48ytlk8rjxs822tf@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c8b5f2c9
  23. 23 6月, 2016 2 次提交
  24. 15 6月, 2016 1 次提交
  25. 20 5月, 2016 1 次提交
  26. 06 5月, 2016 3 次提交
  27. 27 4月, 2016 1 次提交
  28. 18 4月, 2016 2 次提交
  29. 12 4月, 2016 1 次提交
  30. 24 3月, 2016 1 次提交