1. 24 10月, 2017 8 次提交
  2. 23 10月, 2017 3 次提交
  3. 18 10月, 2017 1 次提交
    • J
      perf xyarray: Fix wrong processing when closing evsel fd · 3d8bba95
      Jin Yao 提交于
      In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y()
      returns max_x.
      
      It's confusing and for code logic it looks not correct.
      
      Error happens when closing evsel fd. Let's see this scenario:
      
      1. Allocate an fd (pseudo-code)
      
        perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
        {
      	evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int));
        }
      
        xyarray__new(int xlen, int ylen, size_t entry_size)
        {
      	size_t row_size = ylen * entry_size;
      	struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size);
      
      	xy->entry_size = entry_size;
      	xy->row_size   = row_size;
      	xy->entries    = xlen * ylen;
      	xy->max_x      = xlen;
      	xy->max_y      = ylen;
      	......
        }
      
      So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4.
      
      2. Use perf syscall and get the fd
      
        int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
      		     struct thread_map *threads)
        {
      	for (cpu = 0; cpu < cpus->nr; cpu++) {
      
      		for (thread = 0; thread < nthreads; thread++) {
      			int fd, group_fd;
      
      			fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu],
      						 group_fd, flags);
      
      			FD(evsel, cpu, thread) = fd;
      	}
        }
      
        static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
        {
      	return &xy->contents[x * xy->row_size + y * xy->entry_size];
        }
      
      These codes don't have issues. The issue happens in the closing of fd.
      
      3. Close fd.
      
        void perf_evsel__close_fd(struct perf_evsel *evsel)
        {
      	int cpu, thread;
      
      	for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
      		for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
      			close(FD(evsel, cpu, thread));
      			FD(evsel, cpu, thread) = -1;
      		}
        }
      
        Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y()
        returns max_x (ncpus), so above code is actually to be:
      
              for (cpu = 0; cpu < nthreads; cpu++)
                      for (thread = 0; thread < ncpus; ++thread) {
                              close(FD(evsel, cpu, thread));
                              FD(evsel, cpu, thread) = -1;
                      }
      
        It's not correct!
      
      This change is introduced by "475fb533" ("perf evsel: Fix buffer overflow
      while freeing events")
      
      This fix is to let xyarray__max_x() return max_x (ncpus) and
      let xyarry__max_y() return max_y (nthreads)
      
      Committer note:
      
      This was also fixed by Ravi Bangoria, who provided the same patch,
      noticing the problem with 'perf record':
      
      <quote Ravi>
      I see 'perf record -p <pid>' crashes with following log:
      
         *** Error in `./perf': free(): invalid next size (normal): 0x000000000298b340 ***
         ======= Backtrace: =========
         /lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f7fd85c87e5]
         /lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f7fd85d137a]
         /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f7fd85d553c]
         ./perf(perf_evsel__close+0xb4)[0x4b7614]
         ./perf(perf_evlist__delete+0x100)[0x4ab180]
         ./perf(cmd_record+0x1d9)[0x43a5a9]
         ./perf[0x49aa2f]
         ./perf(main+0x631)[0x427841]
         /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f7fd8571830]
         ./perf(_start+0x29)[0x427a59]
      </>
      Signed-off-by: NJin Yao <yao.jin@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
      Fixes: d74be476 ("perf xyarray: Save max_x, max_y")
      Link: http://lkml.kernel.org/r/1508339478-26674-1-git-send-email-yao.jin@linux.intel.com
      Link: http://lkml.kernel.org/r/1508327446-15302-1-git-send-email-ravi.bangoria@linux.vnet.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3d8bba95
  4. 17 10月, 2017 1 次提交
    • N
      perf buildid-list: Fix crash when processing PERF_RECORD_NAMESPACE · 7f0cd236
      Namhyung Kim 提交于
      Thomas reported that 'perf buildid-list' gets a SEGFAULT due to NULL
      pointer deref when he ran it on a data with namespace events.  It was
      because the buildid_id__mark_dso_hit_ops lacks the namespace event
      handler and perf_too__fill_default() didn't set it.
      
        Program received signal SIGSEGV, Segmentation fault.
        0x0000000000000000 in ?? ()
        Missing separate debuginfos, use: dnf debuginfo-install audit-libs-2.7.7-1.fc25.s390x bzip2-libs-1.0.6-21.fc25.s390x elfutils-libelf-0.169-1.fc25.s390x
        +elfutils-libs-0.169-1.fc25.s390x libcap-ng-0.7.8-1.fc25.s390x numactl-libs-2.0.11-2.ibm.fc25.s390x openssl-libs-1.1.0e-1.1.ibm.fc25.s390x perl-libs-5.24.1-386.fc25.s390x
        +python-libs-2.7.13-2.fc25.s390x slang-2.3.0-7.fc25.s390x xz-libs-5.2.3-2.fc25.s390x zlib-1.2.8-10.fc25.s390x
        (gdb) where
        #0  0x0000000000000000 in ?? ()
        #1  0x00000000010fad6a in machines__deliver_event (machines=<optimized out>, machines@entry=0x2c6fd18,
            evlist=<optimized out>, event=event@entry=0x3fffdf00470, sample=0x3ffffffe880, sample@entry=0x3ffffffe888,
            tool=tool@entry=0x1312968 <build_id.mark_dso_hit_ops>, file_offset=1136) at util/session.c:1287
        #2  0x00000000010fbf4e in perf_session__deliver_event (file_offset=1136, tool=0x1312968 <build_id.mark_dso_hit_ops>,
            sample=0x3ffffffe888, event=0x3fffdf00470, session=0x2c6fc30) at util/session.c:1340
        #3  perf_session__process_event (session=0x2c6fc30, session@entry=0x0, event=event@entry=0x3fffdf00470,
            file_offset=file_offset@entry=1136) at util/session.c:1522
        #4  0x00000000010fddde in __perf_session__process_events (file_size=11880, data_size=<optimized out>,
            data_offset=<optimized out>, session=0x0) at util/session.c:1899
        #5  perf_session__process_events (session=0x0, session@entry=0x2c6fc30) at util/session.c:1953
        #6  0x000000000103b2ac in perf_session__list_build_ids (with_hits=<optimized out>, force=<optimized out>)
            at builtin-buildid-list.c:83
        #7  cmd_buildid_list (argc=<optimized out>, argv=<optimized out>) at builtin-buildid-list.c:115
        #8  0x00000000010a026c in run_builtin (p=0x1311f78 <commands+24>, argc=argc@entry=2, argv=argv@entry=0x3fffffff3c0)
            at perf.c:296
        #9  0x000000000102bc00 in handle_internal_command (argv=<optimized out>, argc=2) at perf.c:348
        #10 run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:392
        #11 main (argc=<optimized out>, argv=0x3fffffff3c0) at perf.c:536
        (gdb)
      
      Fix it by adding a stub event handler for namespace event.
      
      Committer testing:
      
      Further clarifying, plain using 'perf buildid-list' will not end up in a
      SEGFAULT when processing a perf.data file with namespace info:
      
        # perf record -a --namespaces sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.024 MB perf.data (1058 samples) ]
        # perf buildid-list | wc -l
        38
        # perf buildid-list | head -5
        e2a171c7b905826fc8494f0711ba76ab6abbd604 /lib/modules/4.14.0-rc3+/build/vmlinux
        874840a02d8f8a31cedd605d0b8653145472ced3 /lib/modules/4.14.0-rc3+/kernel/arch/x86/kvm/kvm-intel.ko
        ea7223776730cd8a22f320040aae4d54312984bc /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/i915/i915.ko
        5961535e6732a8edb7f22b3f148bb2fa2e0be4b9 /lib/modules/4.14.0-rc3+/kernel/drivers/gpu/drm/drm.ko
        f045f54aa78cf1931cc893f78b6cbc52c72a8cb1 /usr/lib64/libc-2.25.so
        #
      
      It is only when one asks for checking what of those entries actually had
      samples, i.e. when we use either -H or --with-hits, that we will process
      all the PERF_RECORD_ events, and since tools/perf/builtin-buildid-list.c
      neither explicitely set a perf_tool.namespaces() callback nor the
      default stub was set that we end up, when processing a
      PERF_RECORD_NAMESPACE record, causing a SEGFAULT:
      
        # perf buildid-list -H
        Segmentation fault (core dumped)
        ^C
        #
      Reported-and-Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Fixes: f3b3614a ("perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info")
      Link: http://lkml.kernel.org/r/20171017132900.11043-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      7f0cd236
  5. 14 10月, 2017 1 次提交
    • J
      perf tools: Check wether the eBPF file exists in event parsing · 29479bfe
      Jiri Olsa 提交于
      Adding the check wether the eBPF file exists, to consider it
      as eBPF input file. This way we can differentiate eBPF events
      from events that end up with same suffix as eBPF file.
      
      Before:
      
        $ perf stat -e 'cpu/uops_executed.core/'  true
        bpf: builtin compilation failed: -95, try external compiler
        WARNING:        unable to get correct kernel building directory.
        Hint:   Set correct kbuild directory using 'kbuild-dir' option in [llvm]
                section of ~/.perfconfig or set it to "" to suppress kbuild
                detection.
      
        event syntax error: 'cpu/uops_executed.core/'
                             \___ Failed to load cpu/uops_executed.c from source: 'version' section incorrect or lost
      
      After:
      
        $ perf stat -e 'cpu/uops_executed.core/'  true
      
         Performance counter stats for 'true':
      
                   181,533      cpu/uops_executed.core/:u
      
               0.002795447 seconds time elapsed
      
      If user makes type in the eBPF file, we prioritize the event syntax
      and show following warning:
      
        $ perf stat -e 'krava.c//'  true
        event syntax error: 'krava.c//'
                             \___ Cannot find PMU `krava.c'. Missing kernel support?
      Reported-and-Tested-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Changbin Du <changbin.du@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20171013083736.15037-9-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      29479bfe
  6. 10 10月, 2017 1 次提交
    • M
      perf pmu: Unbreak perf record for arm/arm64 with events with explicit PMU · 66ec1191
      Mark Rutland 提交于
      Currently, perf record is broken on arm/arm64 systems when the PMU is
      specified explicitly as part of the event, e.g.
      
      $ ./perf record -e armv8_cortex_a53/cpu_cycles/u true
      
      In such cases, perf record fails to open events unless
      perf_event_paranoid is set to -1, even if the PMU in question supports
      mode exclusion. Further, even when perf_event_paranoid is toggled, no
      samples are recorded.
      
      This is an unintended side effect of commit:
      
        e3ba76de ("perf tools: Force uncore events to system wide monitoring)
      
      ... which assumes that if a PMU has an associated cpu_map, it is an
      uncore PMU, and forces events for such PMUs to be system-wide.
      
      This is not true for arm/arm64 systems, which can have heterogeneous
      CPUs. To account for this, multiple CPU PMUs are exposed, each with a
      "cpus" field under sysfs, which the perf tool parses into a cpu_map. ARM
      PMUs do not have a "cpumask" file, and only have a "cpus" file. For the
      gory details as to why, see commit:
      
       7e3fcffe ("perf pmu: Support alternative sysfs cpumask")
      
      Given all of this, we can instead identify uncore PMUs by explicitly
      checking for a "cpumask" file, and restore arm/arm64 PMU support back to
      a working state. This patch does so, adding a new perf_pmu::is_uncore
      field, and splitting the existing cpumask parsing so that it can be
      reused.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by Will Deacon <will.deacon@arm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: 4.12+ <stable@vger.kernel.org>
      Fixes: e3ba76de ("perf tools: Force uncore events to system wide monitoring)
      Link: http://lkml.kernel.org/r/1507315102-5942-1-git-send-email-mark.rutland@arm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      66ec1191
  7. 05 10月, 2017 1 次提交
  8. 03 10月, 2017 4 次提交
    • K
      perf top: Add option to set the number of thread for event synthesize · 0c6b4994
      Kan Liang 提交于
      Using UINT_MAX to indicate the default thread#, which is the max number
      of online CPU.
      
      Committer testing:
      
        # perf trace --no-inherit -e clone -o /tmp/output perf top --num-thread-synthesize 9
        # cat /tmp/output
               ? (     ?   ):  ... [continued]: clone()) = 26651 (perf)
           0.059 ( 0.010 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfac44f30, parent_tidptr: 0x7f5bfac459d0, child_tidptr: 0x7f5bfac459d0, tls: 0x7f5bfac45700) = 26652 (perf)
           0.116 ( 0.014 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bfa443f30, parent_tidptr: 0x7f5bfa4449d0, child_tidptr: 0x7f5bfa4449d0, tls: 0x7f5bfa444700) = 26653 (perf)
           0.141 ( 0.009 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9c42f30, parent_tidptr: 0x7f5bf9c439d0, child_tidptr: 0x7f5bf9c439d0, tls: 0x7f5bf9c43700) = 26654 (perf)
           0.160 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf9441f30, parent_tidptr: 0x7f5bf94429d0, child_tidptr: 0x7f5bf94429d0, tls: 0x7f5bf9442700) = 26655 (perf)
           0.232 ( 0.013 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5bf8c40f30, parent_tidptr: 0x7f5bf8c419d0, child_tidptr: 0x7f5bf8c419d0, tls: 0x7f5bf8c41700) = 26656 (perf)
           0.393 ( 0.011 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be3ffef30, parent_tidptr: 0x7f5be3fff9d0, child_tidptr: 0x7f5be3fff9d0, tls: 0x7f5be3fff700) = 26657 (perf)
           0.802 ( 0.012 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be37fdf30, parent_tidptr: 0x7f5be37fe9d0, child_tidptr: 0x7f5be37fe9d0, tls: 0x7f5be37fe700) = 26658 (perf)
           1.411 ( 0.022 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26659 (perf)
         246.422 ( 0.042 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7f5be2ffcf30, parent_tidptr: 0x7f5be2ffd9d0, child_tidptr: 0x7f5be2ffd9d0, tls: 0x7f5be2ffd700) = 26660 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-5-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0c6b4994
    • K
      perf top: Implement multithreading for perf_event__synthesize_threads · 340b47f5
      Kan Liang 提交于
      The proc files which is sorted with alphabetical order are evenly
      assigned to several synthesize threads to be processed in parallel.
      
      For 'perf top', the threads number hard code to online CPU number. The
      following patch will introduce an option to set it.
      
      For other perf tools, the thread number is 1. Because the process
      function is not ready for multithreading, e.g.
      process_synthesized_event.
      
      This patch series only support event synthesize multithreading for 'perf
      top'. For other tools, it can be done separately later.
      
      With multithread applied, the total processing time can get up to 1.56x
      speedup on Knights Mill for 'perf top'.
      
      For specific single event processing, the processing time could increase
      because of the lock contention. So proc_map_timeout may need to be
      increased. Otherwise some proc maps will be truncated.
      
      Based on my test, increasing the proc_map_timeout has small impact
      on the total processing time. The total processing time still get 1.49x
      speedup on Knights Mill after increasing the proc_map_timeout.
      The patch itself doesn't increase the proc_map_timeout.
      
      Doesn't need to implement multithreading for per task monitoring,
      perf_event__synthesize_thread_map. It doesn't have performance issue.
      
      Committer testing:
      
        # getconf _NPROCESSORS_ONLN
        4
        # perf trace --no-inherit -e clone -o /tmp/output perf top
        # tail -4 /tmp/bla
           0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
           0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
           0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
         246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      340b47f5
    • K
      perf tools: Lock to protect comm_str rb tree · f988e71b
      Kan Liang 提交于
      Add comm_str_lock to protect comm_str rb tree.
      
      The lock is only needed for multithreaded code, so using mutex wrappers
      provided by perf tool.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-3-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f988e71b
    • K
      perf tools: Lock to protect namespaces and comm list · b32ee9e5
      Kan Liang 提交于
      Add two locks to protect namespaces_list and comm_list.
      
      The lock is only needed for multithreaded code, so using mutex wrappers
      provided by perf tool.
      
      Not all the comm_list/namespaces_list accessing are protected, e.g.
      thread__exec_comm. Because the multithread code for perf top event
      synthesizing does not touch them. They don't need a lock.
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b32ee9e5
  9. 29 9月, 2017 1 次提交
  10. 25 9月, 2017 3 次提交
    • A
      perf tools: Fix syscalltbl build failure · 090657c9
      Akemi Yagi 提交于
      The build of kernel v4.14-rc1 for i686 fails on RHEL 6 with the error
      in tools/perf:
      
        util/syscalltbl.c:157: error: expected ';', ',' or ')' before '__maybe_unused'
        mv: cannot stat `util/.syscalltbl.o.tmp': No such file or directory
      
      Fix it by placing/moving:
      
        #include <linux/compiler.h>
      
        outside of #ifdef HAVE_SYSCALL_TABLE block.
      Signed-off-by: NAkemi Yagi <toracat@elrepo.org>
      Cc: Alan Bartlett <ajb@elrepo.org>
      Link: http://lkml.kernel.org/r/oq41r8$1v9$1@blaine.gmane.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      090657c9
    • M
      perf report: Fix debug messages with --call-graph option · 9789e7e9
      Mengting Zhang 提交于
      With --call-graph option, perf report can display call chains using
      type, min percent threshold, optional print limit and order. And the
      default call-graph parameter is 'graph,0.5,caller,function,percent'.
      
      Before this patch, 'perf report --call-graph' shows incorrect debug
      messages as below:
      
        # perf report --call-graph
        Invalid callchain mode: 0.5
        Invalid callchain order: 0.5
        Invalid callchain sort key: 0.5
        Invalid callchain config key: 0.5
        Invalid callchain mode: caller
        Invalid callchain mode: function
        Invalid callchain order: function
        Invalid callchain mode: percent
        Invalid callchain order: percent
        Invalid callchain sort key: percent
      
      That is because in function __parse_callchain_report_opt(),each field of
      the call-graph parameter is passed to parse_callchain_{mode,order,
      sort_key,value} in turn until it meets the matching value.
      
      For example, the order field "caller" is passed to
      parse_callchain_mode() firstly and obviously it doesn't match any mode
      field. Therefore parse_callchain_mode() will shows the debug message
      "Invalid callchain mode: caller", which could confuse users.
      
      The patch fixes this issue by moving the warning out of the function
      parse_callchain_{mode,order,sort_key,value}.
      Signed-off-by: NMengting Zhang <zhangmengting@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Li Bin <huawei.libin@huawei.com>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/r/1506154694-39691-1-git-send-email-zhangmengting@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9789e7e9
    • A
      perf evsel: Fix attr.exclude_kernel setting for default cycles:p · f1e52f14
      Arnaldo Carvalho de Melo 提交于
      Yet another fix for probing the max attr.precise_ip setting: it is not
      enough settting attr.exclude_kernel for !root users, as they _can_
      profile the kernel if the kernel.perf_event_paranoid sysctl is set to
      -1, so check that as well.
      
      Testing it:
      
      As non root:
      
        $ sysctl kernel.perf_event_paranoid
        kernel.perf_event_paranoid = 2
        $ perf record sleep 1
        $ perf evlist -v
        cycles:uppp: ..., exclude_kernel: 1, ... precise_ip: 3, ...
      
      Now as non-root, but with kernel.perf_event_paranoid set set to the
      most permissive value, -1:
      
        $ sysctl kernel.perf_event_paranoid
        kernel.perf_event_paranoid = -1
        $ perf record sleep 1
        $ perf evlist -v
        cycles:ppp: ..., exclude_kernel: 0, ... precise_ip: 3, ...
        $
      
      I.e. non-root, default kernel.perf_event_paranoid: :uppp modifier = not allowed to sample the kernel,
           non-root, most permissible kernel.perf_event_paranoid: :ppp = allowed to sample the kernel.
      
      In both cases, use the highest available precision: attr.precise_ip = 3.
      Reported-and-Tested-by: NIngo Molnar <mingo@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Fixes: d37a3697 ("perf evsel: Fix attr.exclude_kernel setting for default cycles:p")
      Link: http://lkml.kernel.org/n/tip-nj2qkf75xsd6pw6hhjzfqqdx@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f1e52f14
  11. 22 9月, 2017 2 次提交
    • A
      perf tools: Provide mutex wrappers for pthreads rwlocks · 0a7c74ea
      Arnaldo Carvalho de Melo 提交于
      Andi reported a performance drop in single threaded perf tools such as
      'perf script' due to the growing number of locks being put in place to
      allow for multithreaded tools, so wrap the POSIX threads rwlock routines
      with the names used for such kinds of locks in the Linux kernel and then
      allow for tools to ask for those locks to be used or not.
      
      I.e. a tool may have a multithreaded phase and then switch to single
      threaded, like the upcoming patches for the synthesizing of
      PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to
      single threaded mode in 'perf top'.
      
      The init routines will not be conditional, this way starting as single
      threaded to then move to multi threaded mode should be possible.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a7c74ea
    • A
      perf stat: Fix adding multiple event groups · 411bc316
      Andi Kleen 提交于
      The -M metric group parser threw away the events of earlier groups when
      multiple groups were specified. Fix this here by not overwriting the
      string incorrectly.
      
      Now this works correctly:
      
      % perf stat -M Summary,SMT --metric-only -a sleep 1
      
       Performance counter stats for 'system wide':
      
      Instructions CPI CLKS         CPU_Utilization GFLOPs SMT_2T_Utilization SMT_2T_Utilization Kernel_Utilization CoreIPC CORE_CLKS
      900907376.0  2.7 2398954144.0 0.1             0.0    0.2                0.2                0.1                0.4     2080822855.5
      
      while previously it would only show the SMT metrics.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20170914205735.18431-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      411bc316
  12. 18 9月, 2017 3 次提交
  13. 13 9月, 2017 11 次提交