提交 · 340b47f510bbe55a76b7309107276f02ea11f117 · openanolis / cloud-kernel

03 10月, 2017 6 次提交

perf top: Implement multithreading for perf_event__synthesize_threads · 340b47f5

由 Kan Liang 提交于 9月 29, 2017

The proc files which is sorted with alphabetical order are evenly
assigned to several synthesize threads to be processed in parallel.

For 'perf top', the threads number hard code to online CPU number. The
following patch will introduce an option to set it.

For other perf tools, the thread number is 1. Because the process
function is not ready for multithreading, e.g.
process_synthesized_event.

This patch series only support event synthesize multithreading for 'perf
top'. For other tools, it can be done separately later.

With multithread applied, the total processing time can get up to 1.56x
speedup on Knights Mill for 'perf top'.

For specific single event processing, the processing time could increase
because of the lock contention. So proc_map_timeout may need to be
increased. Otherwise some proc maps will be truncated.

Based on my test, increasing the proc_map_timeout has small impact
on the total processing time. The total processing time still get 1.49x
speedup on Knights Mill after increasing the proc_map_timeout.
The patch itself doesn't increase the proc_map_timeout.

Doesn't need to implement multithreading for per task monitoring,
perf_event__synthesize_thread_map. It doesn't have performance issue.

Committer testing:

  # getconf _NPROCESSORS_ONLN
  4
  # perf trace --no-inherit -e clone -o /tmp/output perf top
  # tail -4 /tmp/bla
     0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
     0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
     0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
   246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
  #
Signed-off-by: NKan Liang <kan.liang@intel.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

340b47f5

perf tools: Lock to protect comm_str rb tree · f988e71b

由 Kan Liang 提交于 9月 29, 2017

Add comm_str_lock to protect comm_str rb tree.

The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.
Signed-off-by: NKan Liang <kan.liang@intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f988e71b

perf tools: Lock to protect namespaces and comm list · b32ee9e5

由 Kan Liang 提交于 9月 29, 2017

Add two locks to protect namespaces_list and comm_list.

The lock is only needed for multithreaded code, so using mutex wrappers
provided by perf tool.

Not all the comm_list/namespaces_list accessing are protected, e.g.
thread__exec_comm. Because the multithread code for perf top event
synthesizing does not touch them. They don't need a lock.
Signed-off-by: NKan Liang <kan.liang@intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b32ee9e5

perf test attr: Fix ignored test case result · 22905582

由 Thomas Richter 提交于 9月 13, 2017

Command perf test -v 16 (Setup struct perf_event_attr test) always
reports success even if the test case fails.  It works correctly if you
also specify -F (for don't fork).

   root@s35lp76 perf]# ./perf test -v 16
   15: Setup struct perf_event_attr               :
   --- start ---
   running './tests/attr/test-record-no-delay'
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.002 MB /tmp/tmp4E1h7R/perf.data
     (1 samples) ]
   expected task=0, got 1
   expected precise_ip=0, got 3
   expected wakeup_events=1, got 0
   FAILED './tests/attr/test-record-no-delay' - match failure
   test child finished with 0
   ---- end ----
   Setup struct perf_event_attr: Ok

The reason for the wrong error reporting is the return value of the
system() library call. It is called in run_dir() file tests/attr.c and
returns the exit status, in above case 0xff00.

This value is given as parameter to the exit() function which can only
handle values 0-0xff.

The child process terminates with exit value of 0 and the parent does
not detect any error.

This patch corrects the error reporting and prints the correct test
result.
Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-rdube6rfcjsr1nzue72c7lqn@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

22905582

perf test attr: Fix python error on empty result · 3440fe27

由 Thomas Richter 提交于 9月 13, 2017

Commit d78ada4a ("perf tests attr: Do not store failed events") does
not create an event file in the /tmp directory when the
perf_open_event() system call failed.

This can lead to a situation where not /tmp/event-xx-yy-zz result file
exists at all (for example on a s390x virtual machine environment) where
no CPUMF hardware is available.

The following command then fails with a python call back chain instead
of printing failure:

  [root@s8360046 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
      -p ./perf -v -ttest-stat-basic
  running './tests/attr//test-stat-basic'
  Traceback (most recent call last):
    File "./tests/attr.py", line 379, in <module>
      main()
    File "./tests/attr.py", line 370, in main
      run_tests(options)
    File "./tests/attr.py", line 311, in run_tests
      Test(f, options).run()
    File "./tests/attr.py", line 300, in run
      self.compare(self.expect, self.result)
    File "./tests/attr.py", line 248, in compare
      exp_event.diff(res_event)
  UnboundLocalError: local variable 'res_event' referenced before assignment
  [root@s8360046 perf]#

This patch catches this pitfall and prints an error message instead:

  [root@s8360047 perf]# /usr/bin/python2 ./tests/attr.py -d ./tests/attr/ \
       -p ./perf  -vvv -ttest-stat-basic
  running './tests/attr//test-stat-basic'
    loading expected events
      Event event:base-stat
        fd = 1
        group_fd = -1
        flags = 0|8
        [....]
        sample_regs_user = 0
        sample_stack_user = 0
    'PERF_TEST_ATTR=/tmp/tmpJbMQMP ./perf stat -o /tmp/tmpJbMQMP/perf.data -e cycles kill >/dev/null 2>&1' ret '1', expected '1'
    loading result events
    compare
      matching [event:base-stat]
      match: [event:base-stat] matches []
      res_event is empty
  FAILED './tests/attr//test-stat-basic' - match failure
  [root@s8360047 perf]#
Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
LPU-Reference: 20170913081209.39570-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-04d63nn7svfgxdhi60gq2mlm@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3440fe27

perf tests attr: Fix task term values · 10836d9f

由 Jiri Olsa 提交于 7月 03, 2017

The perf_event_attr::task is 1 by default for first (tracking) event in
the session. Setting task=1 as default and adding task=0 for cases that
need it.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/20170703145030.12903-16-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

10836d9f

29 9月, 2017 2 次提交

perf test: Fix vmlinux failure on s390x part 2 · 5357413f

由 Thomas Richter 提交于 9月 15, 2017

On s390x perf test 1 failed. It turned out that commit cf6383f7
("perf report: Fix kernel symbol adjustment for s390x") was incorrect.

The previous implementation in dso__load_sym() is also suitable for
s390x.

Therefore this patch undoes commit cf6383f7Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Fixes: cf6383f7 ("perf report: Fix kernel symbol adjustment for s390x")
LPU-Reference: 20170915071404.58398-2-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-v101o8k25vuja2ogosgf15yy@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5357413f

perf test: Fix vmlinux failure on s390x · b28503a3

由 Thomas Richter 提交于 9月 15, 2017

On s390x perf test 1 failed. It turned out that commit 4a084ecf
("perf report: Fix module symbol adjustment for s390x") was incorrect.
The previous implementation in dso__load_sym() is also suitable for
s390x.

Therefore this patch undoes commit 4a084ecf.
Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
Fixes: 4a084ecf ("perf report: Fix module symbol adjustment for s390x")
LPU-Reference: 20170915071404.58398-1-tmricht@linux.vnet.ibm.com
Link: http://lkml.kernel.org/n/tip-5ani7ly57zji7s0hmzkx416l@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

b28503a3

25 9月, 2017 4 次提交

perf tools: Fix syscalltbl build failure · 090657c9

由 Akemi Yagi 提交于 9月 22, 2017

The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error
in tools/perf:

  util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused'
  mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory

Fix it by placing/moving:

  #include <linux/compiler.h>

  outside of #ifdef HAVE_SYSCALL_TABLE block.
Signed-off-by: NAkemi Yagi <toracat@elrepo.org>
Cc: Alan Bartlett <ajb@elrepo.org>
Link: http://lkml.kernel.org/r/oq41r8$1v9$1@blaine.gmane.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

090657c9

perf report: Fix debug messages with --call-graph option · 9789e7e9

由 Mengting Zhang 提交于 9月 23, 2017

With --call-graph option, perf report can display call chains using
type, min percent threshold, optional print limit and order. And the
default call-graph parameter is 'graph,0.5,caller,function,percent'.

Before this patch, 'perf report --call-graph' shows incorrect debug
messages as below:

  # perf report --call-graph
  Invalid callchain mode: 0.5
  Invalid callchain order: 0.5
  Invalid callchain sort key: 0.5
  Invalid callchain config key: 0.5
  Invalid callchain mode: caller
  Invalid callchain mode: function
  Invalid callchain order: function
  Invalid callchain mode: percent
  Invalid callchain order: percent
  Invalid callchain sort key: percent

That is because in function __parse_callchain_report_opt(),each field of
the call-graph parameter is passed to parse_callchain_{mode,order,
sort_key,value} in turn until it meets the matching value.

For example, the order field "caller" is passed to
parse_callchain_mode() firstly and obviously it doesn't match any mode
field. Therefore parse_callchain_mode() will shows the debug message
"Invalid callchain mode: caller", which could confuse users.

The patch fixes this issue by moving the warning out of the function
parse_callchain_{mode,order,sort_key,value}.
Signed-off-by: NMengting Zhang <zhangmengting@huawei.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Krister Johansen <kjlx@templeofstupid.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

9789e7e9

perf evsel: Fix attr.exclude_kernel setting for default cycles:p · f1e52f14

由 Arnaldo Carvalho de Melo 提交于 9月 22, 2017

Yet another fix for probing the max attr.precise_ip setting: it is not
enough settting attr.exclude_kernel for !root users, as they _can_
profile the kernel if the kernel.perf_event_paranoid sysctl is set to
-1, so check that as well.

Testing it:

As non root:

  $ sysctl kernel.perf_event_paranoid
  kernel.perf_event_paranoid = 2
  $ perf record sleep 1
  $ perf evlist -v
  cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...

Now as non-root, but with kernel.perf_event_paranoid set set to the
most permissive value, -1:

  $ sysctl kernel.perf_event_paranoid
  kernel.perf_event_paranoid = -1
  $ perf record sleep 1
  $ perf evlist -v
  cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
  $

I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
     non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.

In both cases, use the highest available precision: attr.precise_ip = 3.
Reported-and-Tested-by: NIngo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: d37a3697 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

f1e52f14

perf tools: Get all of tools/{arch,include}/ in the MANIFEST · 89975bd3

由 Arnaldo Carvalho de Melo 提交于 9月 20, 2017

Now that I'm switching the container builds from using a local volume
pointing to the kernel repository with the perf sources, instead getting
a detached tarball to be able to use a container cluster, some places
broke because I forgot to put some of the required files in
tools/perf/MANIFEST, namely some bitsperlong.h files.

So, to fix it do the same as for tools/build/ and pack the whole
tools/arch/ directory.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

89975bd3

22 9月, 2017 5 次提交

perf tools: Provide mutex wrappers for pthreads rwlocks · 0a7c74ea

由 Arnaldo Carvalho de Melo 提交于 4月 04, 2017

Andi reported a performance drop in single threaded perf tools such as
'perf script' due to the growing number of locks being put in place to
allow for multithreaded tools, so wrap the POSIX threads rwlock routines
with the names used for such kinds of locks in the Linux kernel and then
allow for tools to ask for those locks to be used or not.

I.e. a tool may have a multithreaded phase and then switch to single
threaded, like the upcoming patches for the synthesizing of
PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to
single threaded mode in 'perf top'.

The init routines will not be conditional, this way starting as single
threaded to then move to multi threaded mode should be possible.
Reported-by: NAndi Kleen <ak@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

0a7c74ea

perf tools: Get all of tools/{arch,include}/ in the MANIFEST · 0e1eed80

由 Arnaldo Carvalho de Melo 提交于 9月 20, 2017

Now that I'm switching the container builds from using a local volume
pointing to the kernel repository with the perf sources, instead getting
a detached tarball to be able to use a container cluster, some places
broke because I forgot to put some of the required files in
tools/perf/MANIFEST, namely some bitsperlong.h files.

So, to fix it do the same as for tools/build/ and pack the whole
tools/arch/ directory.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-wmenpjfjsobwdnfde30qqncj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

0e1eed80

perf trace beauty madvise: Generate 'behavior' string table from kernel headers · 5a54c2f5

由 Arnaldo Carvalho de Melo 提交于 9月 20, 2017

This is one more case where the way that syscall parameter values are
defined in kernel headers are easy to parse using a shell script that
will then generate the string table that gets used by the madvise
'behaviour' argument beautifier.

This way as soon as the header syncronization mechanism in perf's build
system detects a change in a copy of a kernel ABI header and that file
is syncronized, we get 'perf trace' updated automagically.

So, when we syncronize this:

  Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'

We'll get these:

  #define MADV_WIPEONFORK 18              /* Zero memory on fork, child only */
  #define MADV_KEEPONFORK 19              /* Undo MADV_WIPEONFORK */

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-dolb0ghds4ui7wc1npgkchvc@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5a54c2f5

perf tests: Remove Intel CQM perf test · 5c9295bf

由 Xiaochen Shen 提交于 9月 19, 2017

Intel CQM perf test is obsolete for perf PMU code has been removed in
commit c39a0e2c ("x86/perf/cqm: Wipe out perf based cqm").
Signed-off-by: NXiaochen Shen <xiaochen.shen@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Pei P Jia <pei.p.jia@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
Link: http://lkml.kernel.org/r/1505797057-16300-1-git-send-email-xiaochen.shen@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5c9295bf

perf stat: Fix adding multiple event groups · 411bc316

由 Andi Kleen 提交于 9月 14, 2017

The -M metric group parser threw away the events of earlier groups when
multiple groups were specified. Fix this here by not overwriting the
string incorrectly.

Now this works correctly:

% perf stat -M Summary,SMT --metric-only -a sleep 1

 Performance counter stats for 'system wide':

Instructions CPI CLKS         CPU_Utilization GFLOPs SMT_2T_Utilization SMT_2T_Utilization Kernel_Utilization CoreIPC CORE_CLKS
900907376.0  2.7 2398954144.0 0.1             0.0    0.2                0.2                0.1                0.4     2080822855.5

while previously it would only show the SMT metrics.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170914205735.18431-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

411bc316

18 9月, 2017 4 次提交

perf tools: Fix leaking rec_argv in error cases · c896f85a

由 Martin Kepplinger 提交于 9月 13, 2017

Let's free the allocated rec_argv in case we return early, in order to
avoid leaking memory.

This adds free() at a few very similar places across the tree where it
was missing.
Signed-off-by: NMartin Kepplinger <martink@posteo.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Martin kepplinger <martink@posteo.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170913191419.29806-1-martink@posteo.deSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

c896f85a

perf pmu: Improve error messages for missing PMUs · 333b5665

由 Andi Kleen 提交于 9月 13, 2017

When a PMU is missing print a better error message mentioning
the missing PMU.

% mkdir empty
% mount --bind empty /sys/devices/msr
% perf stat -M Summary true
event syntax error: '{inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_comp_ops_exe.sse_scalar..'
                     \___ Cannot find PMU `msr'. Missing kernel support?

It still cannot find the right column for aliases, but it's already a vast improvement.

v2: Check asprintf
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170913215006.32222-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

333b5665

perf machine: Optimize a bit the machine__findnew_thread() methods · 75e45e43

由 Arnaldo Carvalho de Melo 提交于 9月 14, 2017

In some cases we already have calculated the hash bucket, so reuse it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-800zehjsyy03er4s4jf0e99v@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

75e45e43

perf machine: Use hashtable for machine threads · 91e467bc

由 Kan Liang 提交于 9月 10, 2017

To process any events, it needs to find the thread in the machine first.
The machine maintains a rb tree to store all threads. The rb tree is
protected by a rw lock.

It is not a problem for current perf which serially processing events.
However, it will have scalability performance issue to process events in
parallel, especially on a heavy load system which have many threads.

Introduce a hashtable to divide the big rb tree into many samll rb tree
for threads. The index is thread id % hashtable size. It can reduce the
lock contention.

Committer notes:

Renamed some variables and function names to reduce semantic confusion:

  'struct threads' pointers: thread -> threads
  threads hastable index: tid -> hash_bucket
  struct threads *machine__thread() -> machine__threads()
  Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang)
Signed-off-by: NKan Liang <kan.liang@intel.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

91e467bc

14 9月, 2017 1 次提交

mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4

由 Michal Hocko 提交于 9月 13, 2017

GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation.  As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.

The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory.  So
this is rather misleading and hard to evaluate for any benefits.

I have checked some random users and none of them has added the flag
with a specific justification.  I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring.  This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.

I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL.  Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.

I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.

This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
seems to be a heuristic without any measured advantage for most (if not
all) its current users.  The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers.  So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.

[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org

[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
  Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ee931c4

13 9月, 2017 18 次提交

perf vendor events: Add JSON metrics for Skylake server · 56de5b63

由 Andi Kleen 提交于 9月 05, 2017

Add JSON metrics for Skylake server
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

56de5b63

perf vendor events: Add JSON metrics for Broadwell DE · 69e93213

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Broadwell DE
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

69e93213

perf vendor events: Add JSON metrics for Broadwell Server · 6d75abd3

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Broadwell Server.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

6d75abd3

perf vendor events: Add JSON metrics for Haswell EP · 5e49f732

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Haswell EP.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5e49f732

perf vendor events: Add JSON metrics for Ivy Town · 43fd36a1

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Ivy Town.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

43fd36a1

perf vendor events: Add JSON metrics for Haswell · 2099f51d

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Haswell.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

2099f51d

perf vendor events: Add JSON metrics for Ivy Bridge · 8853d2de

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Ivy Bridge.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

8853d2de

perf vendor events: Add JSON metrics for Sandy Bridge EP · 28bc0ddb

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Sandy Bridge EP.
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170908180133.GA20128@tassilo.jf.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

28bc0ddb

perf vendor events: Add JSON metrics for Sandy Bridge · 97dca671

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Sandy Bridge.

Committer testing:

  # grep "model name" /proc/cpuinfo | head -1
  model name	: Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz
    # perf list metricgroup

  List of pre-defined events (to be used in -e):

  Metric Groups:

  DSB
  FLOPS
  Frontend
  Frontend_Bandwidth
  Pipeline
  Ports_Utilization
  Power
  SMT
  Summary
  TopDownL1
  # perf stat -M Power --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Turbo_Utilization  C3_Core_Residency  C6_Core_Residency  C7_Core_Residency  C2_Pkg_Residency  C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
     0.8               0.0                98.1               0.0                0.0               0.0               23.4              0.0

       1.001153658 seconds time elapsed

  # perf stat -v -M Power --metric-only -a sleep 1
  Using CPUID GenuineIntel-6-2A
  metric expr cpu_clk_unhalted.thread / cpu_clk_unhalted.ref_tsc for Turbo_Utilization
  found event cpu_clk_unhalted.thread
  found event cpu_clk_unhalted.ref_tsc
  metric expr (cstate_core@c3\-residency@ / msr@tsc@) * 100 for C3_Core_Residency
  found event cstate_core/c3-residency/
  found event msr/tsc/
  metric expr (cstate_core@c6\-residency@ / msr@tsc@) * 100 for C6_Core_Residency
  found event cstate_core/c6-residency/
  found event msr/tsc/
  metric expr (cstate_core@c7\-residency@ / msr@tsc@) * 100 for C7_Core_Residency
  found event cstate_core/c7-residency/
  found event msr/tsc/
  metric expr (cstate_pkg@c2\-residency@ / msr@tsc@) * 100 for C2_Pkg_Residency
  found event cstate_pkg/c2-residency/
  found event msr/tsc/
  metric expr (cstate_pkg@c3\-residency@ / msr@tsc@) * 100 for C3_Pkg_Residency
  found event cstate_pkg/c3-residency/
  found event msr/tsc/
  metric expr (cstate_pkg@c6\-residency@ / msr@tsc@) * 100 for C6_Pkg_Residency
  found event cstate_pkg/c6-residency/
  found event msr/tsc/
  metric expr (cstate_pkg@c7\-residency@ / msr@tsc@) * 100 for C7_Pkg_Residency
  found event cstate_pkg/c7-residency/
  found event msr/tsc/
  adding {cpu_clk_unhalted.thread,cpu_clk_unhalted.ref_tsc}:W,{cstate_core/c3-residency/,msr/tsc/}:W,{cstate_core/c6-residency/,msr/tsc/}:W,{cstate_core/c7-residency/,msr/tsc/}:W,{cstate_pkg/c2-residency/,msr/tsc/}:W,{cstate_pkg/c3-residency/,msr/tsc/}:W,{cstate_pkg/c6-residency/,msr/tsc/}:W,{cstate_pkg/c7-residency/,msr/tsc/}:W
  cpu_clk_unhalted.thread -> cpu/event=0x3c/
  cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
  Weak group for cstate_pkg/c2-residency//2 failed
  Weak group for cstate_pkg/c3-residency//2 failed
  Weak group for cstate_pkg/c6-residency//2 failed
  Weak group for cstate_pkg/c7-residency//2 failed
  cpu_clk_unhalted.thread: 5564185 4002833569 4002833569
  cpu_clk_unhalted.ref_tsc: 7325424 4002833569 4002833569
  cstate_core/c3-residency/: 68293 4003027101 4003027101
  msr/tsc/: 12451294472 4003027101 4003027101
  cstate_core/c6-residency/: 12238830163 4003260984 4003260984
  msr/tsc/: 12452017806 4003260984 4003260984
  cstate_core/c7-residency/: 0 4003489648 4003489648
  msr/tsc/: 12452725162 4003489648 4003489648
  cstate_pkg/c2-residency/: 1830054 1000913138 1000913138
  msr/tsc/: 12453441079 4003717513 4003717513
  cstate_pkg/c3-residency/: 0 1000973570 1000973570
  msr/tsc/: 12454177865 4003954758 4003954758
  cstate_pkg/c6-residency/: 2940448859 1001032370 1001032370
  msr/tsc/: 12454833890 4004166118 4004166118
  cstate_pkg/c7-residency/: 0 1001049818 1001049818
  msr/tsc/: 12454919470 4004194204 4004194204

   Performance counter stats for 'system wide':

  Turbo_Utilization  C3_Core_Residency  C6_Core_Residency  C7_Core_Residency  C2_Pkg_Residency  C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
       0.8             0.0                98.3               0.0                0.0               0.0               23.6              0.0

         1.001126519 seconds time elapsed

  #
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

97dca671

perf vendor events: Add JSON metrics for Skylake · 2e006a24

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Skylake.

Committer testing:

  # grep "model name" /proc/cpuinfo | head -1
  model name	: Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz
  # uname -a
  Linux seventh 4.12.0-rc6+ #1 SMP Fri Jun 30 16:40:55 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
  # perf stat --metric-only -M Summary -a sleep 1

   Performance counter stats for 'system wide':

  Instructions         CPI                  CLKS                 CPU_Utilization      GFLOPs               SMT_2T_Utilization   Kernel_Utilization
  34021097.0               0.0            119424171.0              0.0                 0.0                 0.0                 0.0

         1.001001793 seconds time elapsed

  # perf list metricgroup

  List of pre-defined events (to be used in -e):

  Metric Groups:

  DSB
  FLOPS
  Frontend
  Frontend_Bandwidth
  Memory_BW
  Memory_Bound
  Memory_Lat
  Pipeline
  Ports_Utilization
  Power
  SMT
  Summary
  TLB
  TopDownL1
  Unknown_Branches
  # perf stat --metric-only -M Ports_Utilization -a sleep 1

   Performance counter stats for 'system wide':

  ILP
  1475828.0

       1.000688547 seconds time elapsed

  # perf stat -v --metric-only -M Ports_Utilization -a sleep 1
  Using CPUID GenuineIntel-6-9E
  metric expr uops_executed.thread / ( uops_executed.core_cycles_ge_1 / 2) if #smt_on else uops_executed.core_cycles_ge_1 for ILP
  found event uops_executed.thread
  found event uops_executed.core_cycles_ge_1
  adding {uops_executed.thread,uops_executed.core_cycles_ge_1}:W
  uops_executed.thread -> cpu/umask=0x1,period=2000003,event=0xb1/
  uops_executed.core_cycles_ge_1 -> cpu/umask=0x2,period=2000003,cmask=1,event=0xb1/
  uops_executed.thread: 8115271 4002547654 4002547654
  uops_executed.core_cycles_ge_1: 3282969 4002547654 4002547654

   Performance counter stats for 'system wide':

  ILP
  3282969.0

         1.000719870 seconds time elapsed

  #
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

2e006a24

perf vendor events: Add JSON metrics for Broadwell · cf979623

由 Andi Kleen 提交于 7月 23, 2017

Add JSON metrics for Broadwell.

Commiter testing:

  # uname -a
  Linux jouet 4.13.0-rc7+ #3 SMP Sat Sep 2 09:04:44 -03 2017 x86_64 x86_64 x86_64 GNU/Linux
  # grep "model name" /proc/cpuinfo  | head -1
  model name	: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
  # perf list metricgroup

  List of pre-defined events (to be used in -e):

  Metric Groups:

  DSB
  FLOPS
  Frontend
  Frontend_Bandwidth
  Memory_BW
  Memory_Bound
  Memory_Lat
  Pipeline
  Ports_Utilization
  Power
  SMT
  Summary
  TLB
  TopDownL1
  Unknown_Branches
  # perf stat -M Power --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Turbo_Utilization  C3_Core_Residency  C6_Core_Residency  C7_Core_Residency  C2_Pkg_Residency  C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
       1.1               0.0                 0.0               0.0                0.0               0.0               0.0               0.0

         1.003502904 seconds time elapsed

  #
  # perf stat -M Memory_BW --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  MLP
       1.7

         1.001364525 seconds time elapsed

  #
  # perf stat -M TLB --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Page_Walks_Utilization
       0.1

         1.005962198 seconds time elapsed

  #
  # perf stat -M Summary --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Instructions   CPI          CLKS          CPU_Utilization   GFLOPs  SMT_2T_Utilization  Kernel_Utilization
  7281856697.0       0.0    11150898087.0     1.0              0.0    1.0                 0.7

         1.012134025 seconds time elapsed

  #

Running in verbose mode shows which counters and expressions are being
used:

  # perf stat -v -M Summary --metric-only -a sleep 1
  Using CPUID GenuineIntel-6-3D
  metric expr 1 / inst_retired.any / cycles for CPI
  found event inst_retired.any
  found event cycles
  metric expr cpu_clk_unhalted.thread for CLKS
  found event cpu_clk_unhalted.thread
  metric expr inst_retired.any for Instructions
  found event inst_retired.any
  metric expr cpu_clk_unhalted.ref_tsc / msr@tsc@ for CPU_Utilization
  found event cpu_clk_unhalted.ref_tsc
  found event msr/tsc/
  metric expr ( 1*( fp_arith_inst_retired.scalar_single + fp_arith_inst_retired.scalar_double ) + 2* fp_arith_inst_retired.128b_packed_double + 4*( fp_arith_inst_retired.128b_packed_single + fp_arith_inst_retired.256b_packed_double ) + 8* fp_arith_inst_retired.256b_packed_single ) / 1000000000 / duration_time for GFLOPs
  found event fp_arith_inst_retired.scalar_single
  found event fp_arith_inst_retired.scalar_double
  found event fp_arith_inst_retired.128b_packed_double
  found event fp_arith_inst_retired.128b_packed_single
  found event fp_arith_inst_retired.256b_packed_double
  found event fp_arith_inst_retired.256b_packed_single
  found event duration_time
  metric expr 1 - cpu_clk_thread_unhalted.one_thread_active / ( cpu_clk_thread_unhalted.ref_xclk_any / 2 ) if #smt_on else 0 for SMT_2T_Utilization
  found event cpu_clk_thread_unhalted.one_thread_active
  found event cpu_clk_thread_unhalted.ref_xclk_any
  metric expr cpu_clk_unhalted.ref_tsc:u / cpu_clk_unhalted.ref_tsc for Kernel_Utilization
  found event cpu_clk_unhalted.ref_tsc:u
  found event cpu_clk_unhalted.ref_tsc
  adding {inst_retired.any,cycles}:W,{cpu_clk_unhalted.thread}:W,{inst_retired.any}:W,{cpu_clk_unhalted.ref_tsc,msr/tsc/}:W,{fp_arith_inst_retired.scalar_single,fp_arith_inst_retired.scalar_double,fp_arith_inst_retired.128b_packed_double,fp_arith_inst_retired.128b_packed_single,fp_arith_inst_retired.256b_packed_double,fp_arith_inst_retired.256b_packed_single,duration_time}:W,{cpu_clk_thread_unhalted.one_thread_active,cpu_clk_thread_unhalted.ref_xclk_any}:W,{cpu_clk_unhalted.ref_tsc:u,cpu_clk_unhalted.ref_tsc}:W
  inst_retired.any -> cpu/event=0xc0/
  cpu_clk_unhalted.thread -> cpu/event=0x3c/
  inst_retired.any -> cpu/event=0xc0/
  cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
  fp_arith_inst_retired.scalar_single -> cpu/umask=0x2,period=2000003,event=0xc7/
  fp_arith_inst_retired.scalar_double -> cpu/umask=0x1,period=2000003,event=0xc7/
  fp_arith_inst_retired.128b_packed_double -> cpu/umask=0x4,period=2000003,event=0xc7/
  fp_arith_inst_retired.128b_packed_single -> cpu/umask=0x8,period=2000003,event=0xc7/
  fp_arith_inst_retired.256b_packed_double -> cpu/umask=0x10,period=2000003,event=0xc7/
  fp_arith_inst_retired.256b_packed_single -> cpu/umask=0x20,period=2000003,event=0xc7/
  cpu_clk_thread_unhalted.one_thread_active -> cpu/umask=0x2,period=2000003,event=0x3c/
  cpu_clk_thread_unhalted.ref_xclk_any -> cpu/umask=0x1,any=1,period=2000003,event=0x3c/
  cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
  cpu_clk_unhalted.ref_tsc -> cpu/umask=0x3,period=2000003,event=0/
  Weak group for fp_arith_inst_retired.scalar_single/7 failed
  Weak group for cpu_clk_unhalted.ref_tsc:u/2 failed
  inst_retired.any: 8704146437 4026374016 619883741
  cycles: 11180800018 4026374016 619883741
  cpu_clk_unhalted.thread: 11140030295 4026323772 931621933
  inst_retired.any: 8643115117 4026260510 1243595906
  cpu_clk_unhalted.ref_tsc: 10201638510 4026184297 1247351077
  msr/tsc/: 10378022785 4026184297 1247351077
  fp_arith_inst_retired.scalar_single: 134697 4026102728 1559210545
  fp_arith_inst_retired.scalar_double: 274339 4026007348 1870014984
  fp_arith_inst_retired.128b_packed_double: 1639 4025886054 1866736918
  fp_arith_inst_retired.128b_packed_single: 0 4025776614 2175106569
  fp_arith_inst_retired.256b_packed_double: 0 4025681734 1235551129
  fp_arith_inst_retired.256b_packed_single: 0 4025582962 1232398454
  duration_time: 0 4025552913 4025552913
  cpu_clk_thread_unhalted.one_thread_active: 10505 4025474649 923893076
  cpu_clk_thread_unhalted.ref_xclk_any: 394992110 4025474649 923893076
  cpu_clk_unhalted.ref_tsc:u: 5341421014 4025360315 1231634198
  cpu_clk_unhalted.ref_tsc: 10258278508 4025252611 307909362

   Performance counter stats for 'system wide':

  Instructions         CPI                  CLKS                 CPU_Utilization      GFLOPs               SMT_2T_Utilization   Kernel_Utilization
  8704146437.0             0.0            11140030295.0            1.0                 0.0                 1.0                 0.5

         1.006783654 seconds time elapsed

  #
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20170905195235.GW2482@two.firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

cf979623

perf stat: Fall weak group back even for EBADF · 35c1980e

由 Andi Kleen 提交于 9月 05, 2017

It's not possible to run a package event and a per cpu event in the same
group. This is used by some of the power metrics.  They work correctly
when not using a group.

Normally weak groups should handle that, but in this case EBADF is
returned instead of the normal EINVAL.

  $ strace -e perf_event_open ./perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
  Using CPUID GenuineIntel-6-3E
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, PERF_FLAG_FD_CLOEXEC) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = -1 EINVAL (Invalid argument)
  perf_event_open({type=0x17 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, -1, 0) = 3
  perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 0, 3, 0) = 4
  perf_event_open({type=0x7 /* PERF_TYPE_??? */, size=PERF_ATTR_SIZE_VER5, config=0, ...}, -1, 1, 0, 0) = -1 EBADF (Bad file descriptor)

and perf errors out.

Make weak groups trigger a fall back for EBADF too. Then this case works correctly:

  $ perf stat -v -e '{cstate_pkg/c2-residency/,msr/tsc/}:W' -a sleep 1
  Using CPUID GenuineIntel-6-3E
  Weak group for cstate_pkg/c2-residency//2 failed
  cstate_pkg/c2-residency/: 476709882 1000598460 1000598460
  msr/tsc/: 39625837911 12007369110 12007369110

   Performance counter stats for 'system wide':

         476,709,882      cstate_pkg/c2-residency/
      39,625,837,911      msr/tsc/

         1.000697588 seconds time elapsed

  This fixes perf stat -M Power ...

  $ perf stat -M Power --metric-only -a sleep 1

   Performance counter stats for 'system wide':

  Turbo_Utilization  C3_Core_Residency  C6_Core_Residency C7_Core_Residency  C2_Pkg_Residency   C3_Pkg_Residency  C6_Pkg_Residency  C7_Pkg_Residency
       1.0                 0.7                30.0               0.0               0.9                 0.1               0.4                 0.0

         1.001240740 seconds time elapsed
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170905211324.32427-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

35c1980e

perf tools: Make copyfile_offset() static · c23c2a0f

由 Arnaldo Carvalho de Melo 提交于 9月 11, 2017

There are no usage outside util.c and this is the only remaining reason
for fcntl.h to be included in util.h, to get the loff_t definition in
Alpine Linux, so make it static.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-2dzlsao7k6ihozs5karw6kpx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

c23c2a0f

perf config: Allow creating empty config set for config file autogeneration · 55421b4f

由 Taeung Song 提交于 9月 07, 2017

When there isn't a config file (e.g. ~/.perfconfig) or it has nothing,
the config set wasn't created.

If the config set does not exist, a config file can't be autogenerated.

So allow creating a empty config set in the above case,
then we can support the config file autogeneration.

Before:

  $ rm -f ~/.perfconfig
  $ perf config --user report.children=false

  $ cat ~/.perfconfig
  cat: /root/.perfconfig: No such file or directory

But I think it should work even if there isn't a config file.

After:

  $ rm -f ~/.perfconfig
  $ perf config --user report.children=false

  $ cat ~/.perfconfig
  # this file is auto-generated.
  [report]
      children = false

NOTE:

As a result, if perf_config_set__init() fails, it looks as if the config
set isn't freed. But it isn't a problem.  Because the config set will be
freed by perf_config_set__delete() at the end of cmd_config().
Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754336-9824-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

55421b4f

perf config: Write a config file just once · 5c261555

由 Taeung Song 提交于 9月 07, 2017

Currently set_config() can be repeatedly called for each input config on
the below case:

  $ perf config kmem.default=slab report.children=false ...

But it's a waste, so only once write a config file gathering all given
config key=value pairs.
Signed-off-by: NTaeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754331-9776-1-git-send-email-treeze.taeung@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

5c261555

perf tools: Use scandir() to replace readdir() · ecdad24d

由 Kan Liang 提交于 9月 07, 2017

In perf_event__synthesize_threads() perf goes through all proc files
serially by readdir.

scandir() does a snapshoot of /proc, which is multithreading friendly.

It's possible that some threads which are added during event synthesize.
But the number of lost threads should be small.  They should not impact
the final analysis.
Signed-off-by: NKan Liang <kan.liang@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1504806954-150842-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ecdad24d

perf ui progress: Add size info into progress bar · 8233822f

由 Jiri Olsa 提交于 9月 08, 2017

Adding the size values '[current/total]' into progress bar, to show more
detailed progress of data reading.

Adding new ui_progress__init_size function to specify we want to display
the size.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-5-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

8233822f

perf ui progress: Add ui specific init function · 25cc4eb4

由 Jiri Olsa 提交于 9月 08, 2017

Adding ui specific init function allowing to setup the progress bar
width based on current screen scales.

Adding TUI init function to get more grained update of the progress bar.
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-4-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

25cc4eb4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功