1. 10 3月, 2021 2 次提交
  2. 19 2月, 2021 1 次提交
    • K
      perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing · fbefe9c2
      Kan Liang 提交于
      For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the
      instruction latency. Current perf forces the var2_w to the data->ins_lat
      in the generic code. It works well for now because X86 is the only
      architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may
      bring problems once other architectures support the sample type.  For
      example, the var2_w may be used to capture something else on PowerPC.
      
      Create two architecture specific functions to parse and synthesize the
      weight related samples. Move the X86 specific codes to the X86 version
      functions. Other architectures can implement their own functions later
      separately.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      fbefe9c2
  3. 13 2月, 2021 1 次提交
  4. 09 2月, 2021 2 次提交
    • K
      perf report: Support instruction latency · 590db42d
      Kan Liang 提交于
      The instruction latency information can be recorded on some platforms,
      e.g., the Intel Sapphire Rapids server. With both memory latency
      (weight) and the new instruction latency information, users can easily
      locate the expensive load instructions, and also understand the time
      spent in different stages. The users can optimize their applications in
      different pipeline stages.
      
      The 'weight' field is shared among different architectures. Reusing the
      'weight' field may impacts other architectures. Add a new field to store
      the instruction latency.
      
      Like the 'weight' support, introduce a 'ins_lat' for the global
      instruction latency, and a 'local_ins_lat' for the local instruction
      latency version.
      
      Add new sort functions, INSTR Latency and Local INSTR Latency,
      accordingly.
      
      Add local_ins_lat to the default_mem_sort_order[].
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      590db42d
    • K
      perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT · ea8d0ed6
      Kan Liang 提交于
      The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
      PERF_SAMPLE_WEIGHT sample type. Users can apply either the
      PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample
      type to retrieve the sample weight, but they cannot apply both sample
      types simultaneously.
      
      The new sample type shares the same space as the PERF_SAMPLE_WEIGHT
      sample type. The lower 32 bits are exactly the same for both sample
      type. The higher 32 bits may be different for different architecture.
      
      Add arch specific arch_evsel__set_sample_weight() to set the new sample
      type for X86. Only store the lower 32 bits for the sample->weight if the
      new sample type is applied. In practice, no memory access could last
      than 4G cycles. No data will be lost.
      
      If the kernel doesn't support the new sample type. Fall back to the
      PERF_SAMPLE_WEIGHT sample type.
      
      There is no impact for other architectures.
      
      Committer notes:
      
      Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core
      but not upstream yet.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea8d0ed6
  5. 04 2月, 2021 3 次提交
    • N
      perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events · 473f742e
      Namhyung Kim 提交于
      Like in __event__synthesize_thread(), I think it's better to use
      scandir() instead of the readdir() loop.  In case some malicious task
      continues to create new threads, the readdir() loop will run over and
      over to collect tids.  The scandir() also has the problem but the window
      is much smaller since it doesn't do much work during the iteration.
      
      Also add filter_task() function as we only care the tasks.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      473f742e
    • N
      perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads · c1b90795
      Namhyung Kim 提交于
      To synthesize information to resolve sample IPs, it needs to scan task
      and mmap info from the /proc filesystem.  For each process, it opens
      (and reads) status and maps file respectively.  But as kernel threads
      don't have memory maps so we can skip the maps file.
      
      To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
      It's about the peak virtual memory usage so only user-level tasks have
      that.  Note that it's possible to miss the line due to partial reads.
      So we should double-check if it's a really kernel thread when there's no
      VmPeak line.
      
      Thus check "Threads:" line (which follows the VmPeak line whether or not
      it exists) to be sure it's read enough data - just in case of deeply
      nested pid namespaces or large number of supplementary groups are
      involved.
      
      This is for user process:
      
        $ head -40 /proc/1/status
        Name:	systemd
        Umask:	0000
        State:	S (sleeping)
        Tgid:	1
        Ngid:	0
        Pid:	1
        PPid:	0
        TracerPid:	0
        Uid:	0	0	0	0
        Gid:	0	0	0	0
        FDSize:	256
        Groups:
        NStgid:	1
        NSpid:	1
        NSpgid:	1
        NSsid:	1
        VmPeak:	  234192 kB           <-- here
        VmSize:	  169964 kB
        VmLck:	       0 kB
        VmPin:	       0 kB
        VmHWM:	   29528 kB
        VmRSS:	    6104 kB
        RssAnon:	    2756 kB
        RssFile:	    3348 kB
        RssShmem:	       0 kB
        VmData:	   19776 kB
        VmStk:	    1036 kB
        VmExe:	     784 kB
        VmLib:	    9532 kB
        VmPTE:	     116 kB
        VmSwap:	    2400 kB
        HugetlbPages:	       0 kB
        CoreDumping:	0
        THP_enabled:	1
        Threads:	1                     <-- and here
        SigQ:	1/62808
        SigPnd:	0000000000000000
        ShdPnd:	0000000000000000
        SigBlk:	7be3c0fe28014a03
        SigIgn:	0000000000001000
      
      And this is for kernel thread:
      
        $ head -20 /proc/2/status
        Name:	kthreadd
        Umask:	0000
        State:	S (sleeping)
        Tgid:	2
        Ngid:	0
        Pid:	2
        PPid:	0
        TracerPid:	0
        Uid:	0	0	0	0
        Gid:	0	0	0	0
        FDSize:	64
        Groups:
        NStgid:	2
        NSpid:	2
        NSpgid:	0
        NSsid:	0
        Threads:	1                     <-- here
        SigQ:	1/62808
        SigPnd:	0000000000000000
        ShdPnd:	0000000000000000
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c1b90795
    • N
      perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis · 30626e08
      Namhyung Kim 提交于
      To save memory usage, it needs to reduce the number of entries in the
      proc filesystem.  It's using /proc/<PID>/task directory to traverse
      threads in the process and then kernel creates /proc/<PID>/task/<TID>
      entries.
      
      After that it checks the thread info using the /proc/<TID>/status file
      rather than /proc/<PID>/task/<TID>/status.  As far as I can see, they
      are the same and contain all the info we need.
      
      Using the latter eliminates the unnecessary /proc/<TID> entry.  This can
      be useful especially a large number of threads are used in the system.
      In my experiment around 1KB of memory on average was saved for each
      thread (which is not a thread group leader).
      
      To do this, pass both pid and tid to perf_event_prepare_comm() if it
      knows them.  In case it doesn't know, passing 0 as pid will do the old
      way.
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      30626e08
  6. 21 1月, 2021 1 次提交
  7. 28 12月, 2020 3 次提交
  8. 18 12月, 2020 1 次提交
  9. 01 12月, 2020 1 次提交
  10. 28 11月, 2020 1 次提交
    • N
      perf record: Synthesize cgroup events only if needed · aa50d953
      Namhyung Kim 提交于
      It didn't check the tool->cgroup_events bit which is set when the
      --all-cgroups option is given.  Without it, samples will not have cgroup
      info so no reason to synthesize.
      
      We can check the PERF_RECORD_CGROUP records after running perf record
      *WITHOUT* the --all-cgroups option:
      
      Before:
      
        $ perf report -D | grep CGROUP
        0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
                CGROUP events:          1
                CGROUP events:          0
                CGROUP events:          0
      
      After:
      
        $ perf report -D | grep CGROUP
                CGROUP events:          0
                CGROUP events:          0
                CGROUP events:          0
      
      Committer testing:
      
      Before:
      
        # perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:        146
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      After:
      
        # perf record -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:          0
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      With all-cgroups:
      
        # perf record --all-cgroups -a sleep 1
        [ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
        # perf report -D | grep "CGROUP events"
                  CGROUP events:        146
                  CGROUP events:          0
                  CGROUP events:          0
        #
      
      Fixes: 8fb4b679 ("perf record: Add --all-cgroups option")
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      aa50d953
  11. 14 10月, 2020 1 次提交
  12. 23 9月, 2020 1 次提交
    • L
      perf tsc: Move out common functions from x86 · 03fca3af
      Leo Yan 提交于
      Functions perf_read_tsc_conversion() and perf_event__synth_time_conv()
      should work as common functions rather than x86 specific, so move these
      two functions out from arch/x86 folder and place them into util/tsc.c.
      
      Since the function perf_event__synth_time_conv() will be linked in
      util/tsc.c, remove its weak version.
      
      Committer testing:
      
      Before/after:
      
        # perf test tsc
        70: Convert perf time to TSC                                        : Ok
        #
        # perf test -v tsc
        70: Convert perf time to TSC                                        :
        --- start ---
        test child forked, pid 8520
        mmap size 528384B
        1st event perf time 592110439891 tsc 2317172044331
        rdtsc          time 592110441915 tsc 2317172052010
        2nd event perf time 592110442336 tsc 2317172053605
        test child finished with 0
        ---- end ----
        Convert perf time to TSC: Ok
        #
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Kemeng Shi <shikemeng@huawei.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Nick Gasson <nick.gasson@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Remi Bernon <rbernon@codeweavers.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steve Maclean <steve.maclean@microsoft.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zou Wei <zou_wei@huawei.com>
      Link: http://lore.kernel.org/lkml/20200914115311.2201-2-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      03fca3af
  13. 30 4月, 2020 2 次提交
    • Z
      perf tools: Remove unneeded semicolons · 8284bbea
      Zou Wei 提交于
      Fixes coccicheck warnings:
      
        tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon
        tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon
        tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon
        tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon
        tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon
        tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon
        tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NZou Wei <zou_wei@huawei.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8284bbea
    • I
      perf synthetic events: Remove use of sscanf from /proc reading · 2069425e
      Ian Rogers 提交于
      The synthesize benchmark, run on a single process and thread, shows
      perf_event__synthesize_mmap_events as the hottest function with fgets
      and sscanf taking the majority of execution time.
      
      fscanf performs similarly well. Replace the scanf call with manual
      reading of each field of the /proc/pid/maps line, and remove some
      unnecessary buffering.
      
      This change also addresses potential, but unlikely, buffer overruns for
      the string values read by scanf.
      
      Performance before is:
      
        $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
        \# Running 'internals/synthesize' benchmark:
        Computing performance of single threaded perf event synthesis by
        synthesizing events on the perf process itself:
          Average synthesis took: 102.810 usec (+- 0.027 usec)
          Average num. events: 17.000 (+- 0.000)
          Average time per event 6.048 usec
          Average data synthesis took: 106.325 usec (+- 0.018 usec)
          Average num. events: 89.000 (+- 0.000)
          Average time per event 1.195 usec
        Computing performance of multi threaded perf event synthesis by
        synthesizing events on CPU 0:
          Number of synthesis threads: 16
            Average synthesis took: 68103.100 usec (+- 441.234 usec)
            Average num. events: 30703.000 (+- 0.730)
            Average time per event 2.218 usec
      
      And after is:
      
        $ sudo perf bench internals synthesize -m 16 -M 16 -s -t
        \# Running 'internals/synthesize' benchmark:
        Computing performance of single threaded perf event synthesis by
        synthesizing events on the perf process itself:
          Average synthesis took: 50.388 usec (+- 0.031 usec)
          Average num. events: 17.000 (+- 0.000)
          Average time per event 2.964 usec
          Average data synthesis took: 52.693 usec (+- 0.020 usec)
          Average num. events: 89.000 (+- 0.000)
          Average time per event 0.592 usec
        Computing performance of multi threaded perf event synthesis by
        synthesizing events on CPU 0:
          Number of synthesis threads: 16
            Average synthesis took: 45022.400 usec (+- 552.740 usec)
            Average num. events: 30624.200 (+- 10.037)
            Average time per event 1.470 usec
      
      On a Intel Xeon 6154 compiling with Debian gcc 9.2.1.
      
      Committer testing:
      
      On a AMD Ryzen 5 3600X 6-Core Processor:
      
      Before:
      
        # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
        # Running 'internals/synthesize' benchmark:
        Computing performance of single threaded perf event synthesis by
        synthesizing events on the perf process itself:
          Average synthesis took: 267.491 usec (+- 0.176 usec)
          Average num. events: 56.000 (+- 0.000)
          Average time per event 4.777 usec
          Average data synthesis took: 277.257 usec (+- 0.169 usec)
          Average num. events: 287.000 (+- 0.000)
          Average time per event 0.966 usec
        Computing performance of multi threaded perf event synthesis by
        synthesizing events on CPU 0:
          Number of synthesis threads: 12
            Average synthesis took: 81599.500 usec (+- 346.315 usec)
            Average num. events: 36096.100 (+- 2.523)
            Average time per event 2.261 usec
        #
      
      After:
      
        # perf bench internals synthesize --min-threads 12 --max-threads 12 --st --mt
        # Running 'internals/synthesize' benchmark:
        Computing performance of single threaded perf event synthesis by
        synthesizing events on the perf process itself:
          Average synthesis took: 110.125 usec (+- 0.080 usec)
          Average num. events: 56.000 (+- 0.000)
          Average time per event 1.967 usec
          Average data synthesis took: 118.518 usec (+- 0.057 usec)
          Average num. events: 287.000 (+- 0.000)
          Average time per event 0.413 usec
        Computing performance of multi threaded perf event synthesis by
        synthesizing events on CPU 0:
          Number of synthesis threads: 12
            Average synthesis took: 43490.700 usec (+- 284.527 usec)
            Average num. events: 37028.500 (+- 0.563)
            Average time per event 1.175 usec
        #
      Signed-off-by: NIan Rogers <irogers@google.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NNamhyung Kim <namhyung@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200415054050.31645-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2069425e
  14. 16 4月, 2020 1 次提交
    • I
      perf synthetic-events: save 4kb from 2 stack frames · 04ed4ccb
      Ian Rogers 提交于
      Reuse an existing char buffer to avoid two PATH_MAX sized char buffers.
      
      Reduces stack frame sizes by 4kb.
      
      perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after
      'sub $0x35b8,%rsp'.
      
      perf_event__get_comm_ids before 'sub $0x2028,%rsp' after
      'sub $0x1028,%rsp'.
      
      The performance impact of this change is negligible.
      Signed-off-by: NIan Rogers <irogers@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andrey Zhizhikin <andrey.z@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lore.kernel.org/lkml/20200402154357.107873-4-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      04ed4ccb
  15. 03 4月, 2020 2 次提交
  16. 18 3月, 2020 1 次提交
    • I
      perf tools: Give synthetic mmap events an inode generation · 3b7a15b0
      Ian Rogers 提交于
      When mmap2 events are synthesized the ino_generation field isn't being
      set leading to uninitialized memory being compared.
      
      Caught with clang's -fsanitize=memory:
      
      ==124733==WARNING: MemorySanitizer: use-of-uninitialized-value
          #0 0x55a96a6a65cc in __dso_id__cmp tools/perf/util/dsos.c:23:6
          #1 0x55a96a6a81d5 in dso_id__cmp tools/perf/util/dsos.c:38:9
          #2 0x55a96a6a717f in __dso__cmp_long_name tools/perf/util/dsos.c:74:15
          #3 0x55a96a6a6c4c in __dsos__findnew_link_by_longname_id tools/perf/util/dsos.c:106:12
          #4 0x55a96a6a851e in __dsos__findnew_by_longname_id tools/perf/util/dsos.c:178:9
          #5 0x55a96a6a7798 in __dsos__find_id tools/perf/util/dsos.c:191:9
          #6 0x55a96a6a7b57 in __dsos__findnew_id tools/perf/util/dsos.c:251:20
          #7 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #8 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #9 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #10 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #11 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #12 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #13 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #14 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #15 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #16 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #17 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #18 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #19 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #20 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #21 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #22 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #23 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #24 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #25 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #26 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #27 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #28 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was stored to memory at
          #1 0x55a96a6a18f7 in dso__new_id tools/perf/util/dso.c:1230:14
          #2 0x55a96a6a78ee in __dsos__addnew_id tools/perf/util/dsos.c:233:20
          #3 0x55a96a6a7bcc in __dsos__findnew_id tools/perf/util/dsos.c:252:21
          #4 0x55a96a6a7a57 in dsos__findnew_id tools/perf/util/dsos.c:259:17
          #5 0x55a96a7776ae in machine__findnew_dso_id tools/perf/util/machine.c:2709:9
          #6 0x55a96a77dfcf in map__new tools/perf/util/map.c:193:10
          #7 0x55a96a77240a in machine__process_mmap2_event tools/perf/util/machine.c:1670:8
          #8 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #9 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #10 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #11 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #12 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #13 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #14 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #15 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #16 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #17 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #18 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #19 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
      
        Uninitialized value was stored to memory at
          #0 0x55a96a7725af in machine__process_mmap2_event tools/perf/util/machine.c:1646:25
          #1 0x55a96a7741a3 in machine__process_event tools/perf/util/machine.c:1882:9
          #2 0x55a96a6aee39 in perf_event__process tools/perf/util/event.c:454:9
          #3 0x55a96a87d633 in perf_tool__process_synth_event tools/perf/util/synthetic-events.c:63:9
          #4 0x55a96a87f131 in perf_event__synthesize_mmap_events tools/perf/util/synthetic-events.c:403:7
          #5 0x55a96a8815d6 in __event__synthesize_thread tools/perf/util/synthetic-events.c:548:9
          #6 0x55a96a882bff in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:681:3
          #7 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #8 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #9 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #10 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #11 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #12 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #13 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #14 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #15 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #16 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #17 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #18 0x55a96a282223 in main tools/perf/perf.c:538:3
      
        Uninitialized value was created by a heap allocation
          #0 0x55a96a22f60d in malloc llvm/llvm-project/compiler-rt/lib/msan/msan_interceptors.cpp:925:3
          #1 0x55a96a882948 in __perf_event__synthesize_threads tools/perf/util/synthetic-events.c:655:15
          #2 0x55a96a881ec2 in perf_event__synthesize_threads tools/perf/util/synthetic-events.c:750:9
          #3 0x55a96a562b26 in synth_all tools/perf/tests/mmap-thread-lookup.c:136:9
          #4 0x55a96a5623b1 in mmap_events tools/perf/tests/mmap-thread-lookup.c:174:8
          #5 0x55a96a561fa0 in test__mmap_thread_lookup tools/perf/tests/mmap-thread-lookup.c:230:2
          #6 0x55a96a52c182 in run_test tools/perf/tests/builtin-test.c:378:9
          #7 0x55a96a52afc1 in test_and_print tools/perf/tests/builtin-test.c:408:9
          #8 0x55a96a52966e in __cmd_test tools/perf/tests/builtin-test.c:603:4
          #9 0x55a96a52855d in cmd_test tools/perf/tests/builtin-test.c:747:9
          #10 0x55a96a2844d4 in run_builtin tools/perf/perf.c:312:11
          #11 0x55a96a282bd0 in handle_internal_command tools/perf/perf.c:364:8
          #12 0x55a96a284097 in run_argv tools/perf/perf.c:408:2
          #13 0x55a96a282223 in main tools/perf/perf.c:538:3
      
      SUMMARY: MemorySanitizer: use-of-uninitialized-value tools/perf/util/dsos.c:23:6 in __dso_id__cmp
      Signed-off-by: NIan Rogers <irogers@google.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: clang-built-linux@googlegroups.com
      Link: http://lore.kernel.org/lkml/20200313053129.131264-1-irogers@google.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b7a15b0
  17. 10 3月, 2020 1 次提交
    • K
      perf tools: Add hw_idx in struct branch_stack · 42bbabed
      Kan Liang 提交于
      The low level index of raw branch records for the most recent branch can
      be recorded in a sample with PERF_SAMPLE_BRANCH_HW_INDEX
      branch_sample_type. Extend struct branch_stack to support it.
      
      However, if the PERF_SAMPLE_BRANCH_HW_INDEX is not applied, only nr and
      entries[] will be output by kernel. The pointer of entries[] could be
      wrong, since the output format is different with new struct
      branch_stack.  Add a variable no_hw_idx in struct perf_sample to
      indicate whether the hw_idx is output.  Add get_branch_entry() to return
      corresponding pointer of entries[0].
      
      To make dummy branch sample consistent as new branch sample, add hw_idx
      in struct dummy_branch_stack for cs-etm and intel-pt.
      
      Apply the new struct branch_stack for synthetic events as well.
      
      Extend test case sample-parsing to support new struct branch_stack.
      
      Committer notes:
      
      Renamed get_branch_entries() to perf_sample__branch_entries() to have
      proper namespacing and pave the way for this to be moved to libperf,
      eventually.
      
      Add 'static' to that inline as it is in a header.
      
      Add 'hw_idx' to 'struct dummy_branch_stack' in cs-etm.c to fix the build
      on arm64.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pavel Gerasimov <pavel.gerasimov@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vitaly Slobodskoy <vitaly.slobodskoy@intel.com>
      Link: http://lore.kernel.org/lkml/20200228163011.19358-2-kan.liang@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      42bbabed
  18. 26 11月, 2019 1 次提交
    • A
      perf maps: Merge 'struct maps' with 'struct map_groups' · 79b6bb73
      Arnaldo Carvalho de Melo 提交于
      And pick the shortest name: 'struct maps'.
      
      The split existed because we used to have two groups of maps, one for
      functions and one for variables, but that only complicated things,
      sometimes we needed to figure out what was at some address and then had
      to first try it on the functions group and if that failed, fall back to
      the variables one.
      
      That split is long gone, so for quite a while we had only one struct
      maps per struct map_groups, simplify things by combining those structs.
      
      First patch is the minimum needed to merge both, follow up patches will
      rename 'thread->mg' to 'thread->maps', etc.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Link: https://lkml.kernel.org/n/tip-hom6639ro7020o708trhxh59@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      79b6bb73
  19. 21 11月, 2019 1 次提交
  20. 07 11月, 2019 1 次提交
  21. 25 9月, 2019 3 次提交
  22. 20 9月, 2019 3 次提交
  23. 01 9月, 2019 1 次提交
  24. 30 8月, 2019 3 次提交
  25. 29 8月, 2019 2 次提交