1. 18 12月, 2018 11 次提交
  2. 22 11月, 2018 5 次提交
  3. 21 11月, 2018 1 次提交
  4. 20 11月, 2018 2 次提交
    • J
      perf tools: Restore proper cwd on return from mnt namespace · b01c1f69
      Jiri Olsa 提交于
      When reporting on 'record' server we try to retrieve/use the mnt
      namespace of the profiled tasks. We use following API with cookie to
      hold the return namespace, roughly:
      
        nsinfo__mountns_enter(struct nsinfo *nsi, struct nscookie *nc)
          setns(newns, 0);
        ...
        new ns related open..
        ...
        nsinfo__mountns_exit(struct nscookie *nc)
          setns(nc->oldns)
      
      Once finished we setns to old namespace, which also sets the current
      working directory (cwd) to "/", trashing the cwd we had.
      
      This is mostly fine, because we use absolute paths almost everywhere,
      but it screws up 'perf diff':
      
        # perf diff
        failed to open perf.data: No such file or directory  (try 'perf record' first)
        ...
      
      Adding the current working directory to be part of the cookie and
      restoring it in the nsinfo__mountns_exit call.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Fixes: 843ff37b ("perf symbols: Find symbols in different mount namespace")
      Link: http://lkml.kernel.org/r/20181101170001.30019-1-jolsa@kernel.org
      [ No need to check for NULL args for free(), use zfree() for struct members ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b01c1f69
    • A
      tools build feature: Check if get_current_dir_name() is available · 8feb8efe
      Arnaldo Carvalho de Melo 提交于
      As the namespace support code will use this, which is not available in
      some non _GNU_SOURCE libraries such as Android's bionic used in my
      container build tests (r12b and r15c at the moment).
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-x56ypm940pwclwu45d7jfj47@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8feb8efe
  5. 13 11月, 2018 1 次提交
  6. 06 11月, 2018 5 次提交
    • J
      perf tools: Do not zero sample_id_all for group members · 8e88c29b
      Jiri Olsa 提交于
      Andi reported following malfunction:
      
        # perf record -e '{ref-cycles,cycles}:S' -a sleep 1
        # perf script
        non matching sample_id_all
      
      That's because we disable sample_id_all bit for non-sampling group
      members. We can't do that, because it needs to be the same over the
      whole event list. This patch keeps it untouched again.
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Tested-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180923150420.27327-1-jolsa@kernel.org
      Fixes: e9add8ba ("perf evsel: Disable write_backward for leader sampling group events")
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8e88c29b
    • A
      perf intel-pt: Add MTC and CYC timestamps to debug log · f6c23e3b
      Adrian Hunter 提交于
      One cause of decoding errors is un-synchronized side-band data.
      Timestamps are needed to debug such cases. TSC packet timestamps are
      logged. Log also MTC and CYC timestamps.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/20181105073505.8129-3-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f6c23e3b
    • A
      perf intel-pt: Add more event information to debug log · 93f8be27
      Adrian Hunter 提交于
      More event information is useful for debugging, especially MMAP events.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Link: http://lkml.kernel.org/r/20181105073505.8129-2-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      93f8be27
    • T
      perf stat: Handle different PMU names with common prefix · ea1fa48c
      Thomas Richter 提交于
      On s390 the CPU Measurement Facility for counters now supports
      2 PMUs named cpum_cf (CPU Measurement Facility for counters) and
      cpum_cf_diag (CPU Measurement Facility for diagnostic counters)
      for one and the same CPU.
      
      Running command
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      	 -- ~/mytests/cf-tx-events 1
      
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
        2      tx_c_tend
      
            0.002120091 seconds time elapsed
      
            0.000121000 seconds user
            0.002127000 seconds sys
      
       [root@s35lp76 perf]#
      
      displays output which is unexpected (and wrong):
      
        2      tx_c_tend
      
      The test program definitely triggers only one transaction, as shown
      in line 'TX_C_TEND: 1 expected:1'.
      
      This is caused by the following call sequence:
      
      pmu_lookup() scans and installs a PMU.
      +--> pmu_aliases() parses all aliases in directory
      		.../<pmu-name>/events/* which are file names.
           +--> pmu_aliases_parse() Read each file in directory and create
                            an new alias entry. This is done with
                +--> perf_pmu__new_alias() and
      	       +--> __perf_pmu__new_alias() which also check for
      	                   identical alias names.
      
      After pmu_aliases() returns, a complete list of event names
      for this pmu has been created. Now function
      
      pmu_add_cpu_aliases()   is called to add the events listed in the json
      |                       files to the alias list of the cpu.
      +--> perf_pmu__find_map()  Returns a pointer to the json events.
      
      Now function pmu_add_cpu_aliases() scans through all events listed
      in the JSON files for this CPU.
      Each json event pmu name is compared with the current PMU being
      built up and if they mismatch, the json event is added to the
      current PMUs alias list.
      To avoid duplicate entries the following comparison is done:
      
      	if (!is_arm_pmu_core(name)) {
      	     pname = pe->pmu ? pe->pmu : "cpu";
      	     if (strncmp(pname, name, strlen(pname)))
      		     continue;
           }
      
      The culprit is the strncmp() function.
      
      Using current s390 PMU naming, the first PMU is 'cpum_cf'
      and a long list of events is added, among them 'tx_c_tend'
      
      When the second PMU named 'cpum_cf_diag' is added, only one event
      named 'CF_DIAG' is added by the pmu_aliases()  function.
      
      Now function pmu_add_cpu_aliases() is invoked for PMU 'cpum_cf_diag'.
      Since the CPUID string is the same for both PMUs, json file events
      for PMU named 'cpum_cf' are added to the PMU 'cpm_cf_diag'
      
      This happens because the strncmp() actually compares:
      
           strncmp("cpum_cf", "cpum_cf_diag", 6);
      
      The first parameter is the pmu name taken from the event in
      the json file. The second parameter is the pmu name of the PMU
      currently being built.
      They are different, but the length of the compare only tests the
      common prefix and this returns 0(true) when it should return false.
      
      Now all events for PMU cpum_cf are added to the alias list for pmu
      cpum_cf_diag.
      
      Later on in function parse_events_add_pmu() the event 'tx_c_end' is
      searched in all available PMUs and found twice, adding it two
      times to the evsel_list global variable which is the root
      of all events. This results in a counter value of 2 instead
      of 1.
      
      Output with this patch:
      
       [root@s35lp76 perf]# ./perf stat -e tx_c_tend \
      			-- ~/mytests/cf-tx-events 1
       Measuring transactions
       TX_C_TABORT_NO_SPECIAL: 0 expected:0
       TX_C_TABORT_SPECIAL: 0 expected:0
       TX_C_TEND: 1 expected:1
       TX_NC_TABORT: 11 expected:11
       TX_NC_TEND: 1 expected:1
      
       Performance counter stats for '/root/mytests/cf-tx-events 1':
      
                        1      tx_c_tend
      
            0.001815365 seconds time elapsed
      
            0.000123000 seconds user
            0.001756000 seconds sys
      
       [root@s35lp76 perf]#
      Signed-off-by: NThomas Richter <tmricht@linux.ibm.com>
      Reviewed-by: NHendrik Brueckner <brueckner@linux.ibm.com>
      Reviewed-by: NSebastien Boisvert <sboisvert@gydle.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Kan Liang <kan.liang@linux.intel.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: stable@vger.kernel.org
      Fixes: 292c34c1 ("perf pmu: Fix core PMU alias list for X86 platform")
      Link: http://lkml.kernel.org/r/20181023151616.78193-1-tmricht@linux.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ea1fa48c
    • A
      perf evlist: Move perf_evsel__reset_weak_group into evlist · c3537fc2
      Andi Kleen 提交于
      - Move the function from builtin-stat to evlist for reuse
      - Rename to evlist to match purpose better
      - Pass the evlist as first argument.
      - No functional changes
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181001195927.14211-1-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c3537fc2
  7. 31 10月, 2018 6 次提交
    • A
      perf intel-pt/bts: Calculate cpumode for synthesized samples · 5d4f0eda
      Adrian Hunter 提交于
      In the absence of a fallback, samples must provide a correct cpumode for
      the 'ip'. Do that now there is no fallback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/20181031091043.23465-6-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5d4f0eda
    • A
      perf intel-pt: Insert callchain context into synthesized callchains · 24248306
      Adrian Hunter 提交于
      In the absence of a fallback, callchains must encode also the callchain
      context. Do that now there is no fallback.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Reviewed-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Leo Yan <leo.yan@linaro.org>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: stable@vger.kernel.org # 4.19
      Link: http://lkml.kernel.org/r/100ea2ec-ed14-b56d-d810-e0a6d2f4b069@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      24248306
    • D
      perf tools: Don't clone maps from parent when synthesizing forks · 4f8f382e
      David Miller 提交于
      When synthesizing FORK events, we are trying to create thread objects
      for the already running tasks on the machine.
      
      Normally, for a kernel FORK event, we want to clone the parent's maps
      because that is what the kernel just did.
      
      But when synthesizing, this should not be done.  If we do, we end up
      with overlapping maps as we process the sythesized MMAP2 events that
      get delivered shortly thereafter.
      
      Use the FORK event misc flags in an internal way to signal this
      situation, so we can elide the map clone when appropriate.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Joe Mario <jmario@redhat.com>
      Link: http://lkml.kernel.org/r/20181030.222404.2085088822877051075.davem@davemloft.net
      [ Added comment about flag use in machine__process_fork_event(),
        use ternary op in thread__clone_map_groups() as suggested by Jiri ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4f8f382e
    • D
      perf callchain: Honour the ordering of PERF_CONTEXT_{USER,KERNEL,etc} · e9024d51
      David S. Miller 提交于
      When processing using 'perf report -g caller', which is the default, we
      ended up reverting the callchain entries received from the kernel, but
      simply reverting throws away the information that tells that from a
      point onwards the addresses are for userspace, kernel, guest kernel,
      guest user, hypervisor.
      
      The idea is that if we are walking backwards, for each cluster of
      non-cpumode entries we have to first scan backwards for the next one and
      use that for the cluster.
      
      This seems silly and more expensive than it needs to be but it is enough
      for a initial fix.
      
      The code here is really complicated because it is intimately intertwined
      with the lbr and branch handling, as well as this callchain order,
      further fixes will be needed to properly take into account the cpumode
      in those cases.
      
      Another problem with ORDER_CALLER is that the NULL "0" IP that is at the
      end of most callchains shows up at the top of the histogram because
      every callchain contains it and with ORDER_CALLER it is the first entry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Souvik Banerjee <souvik1997@gmail.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: stable@vger.kernel.org # 4.19
      Link: https://lkml.kernel.org/n/tip-2wt3ayp6j2y2f2xowixa8y6y@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e9024d51
    • L
      perf cs-etm: Correct CPU mode for samples · d6c9c05f
      Leo Yan 提交于
      Since commit edeb0c90 ("perf tools: Stop fallbacking to kallsyms for
      vdso symbols lookup"), the kernel address cannot be properly parsed to
      kernel symbol with command 'perf script -k vmlinux'.  The reason is
      CoreSight samples is always to set CPU mode as PERF_RECORD_MISC_USER,
      thus it fails to find corresponding map/dso in below flows:
      
        process_sample_event()
          `-> machine__resolve()
      	  `-> thread__find_map(thread, sample->cpumode, sample->ip, al);
      
      In this flow it needs to pass argument 'sample->cpumode' to tell what's
      the CPU mode, before it always passed PERF_RECORD_MISC_USER but without
      any failure until the commit edeb0c90 ("perf tools: Stop fallbacking
      to kallsyms for vdso symbols lookup") has been merged.  The reason is
      even with the wrong CPU mode the function thread__find_map() firstly
      fails to find map but it will rollback to find kernel map for vdso
      symbols lookup.  In the latest code it has removed the fallback code,
      thus if CPU mode is PERF_RECORD_MISC_USER then it cannot find map
      anymore with kernel address.
      
      This patch is to correct samples CPU mode setting, it creates a new
      helper function cs_etm__cpu_mode() to tell what's the CPU mode based on
      the address with the info from machine structure; this patch has a bit
      extension to check not only kernel and user mode, but also check for
      host/guest and hypervisor mode.  Finally this patch uses the function in
      instruction and branch samples and also apply in cs_etm__mem_access()
      for a minor polishing.
      Signed-off-by: NLeo Yan <leo.yan@linaro.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: coresight@lists.linaro.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: stable@kernel.org # v4.19
      Link: http://lkml.kernel.org/r/1540883908-17018-1-git-send-email-leo.yan@linaro.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d6c9c05f
    • M
      perf unwind: Take pgoff into account when reporting elf to libdwfl · 1fe627da
      Milian Wolff 提交于
      libdwfl parses an ELF file itself and creates mappings for the
      individual sections. perf on the other hand sees raw mmap events which
      represent individual sections. When we encounter an address pointing
      into a mapping with pgoff != 0, we must take that into account and
      report the file at the non-offset base address.
      
      This fixes unwinding with libdwfl in some cases. E.g. for a file like:
      
      ```
      
      using namespace std;
      
      mutex g_mutex;
      
      double worker()
      {
          lock_guard<mutex> guard(g_mutex);
          uniform_real_distribution<double> uniform(-1E5, 1E5);
          default_random_engine engine;
          double s = 0;
          for (int i = 0; i < 1000; ++i) {
              s += norm(complex<double>(uniform(engine), uniform(engine)));
          }
          cout << s << endl;
          return s;
      }
      
      int main()
      {
          vector<std::future<double>> results;
          for (int i = 0; i < 10000; ++i) {
              results.push_back(async(launch::async, worker));
          }
          return 0;
      }
      ```
      
      Compile it with `g++ -g -O2 -lpthread cpp-locking.cpp  -o cpp-locking`,
      then record it with `perf record --call-graph dwarf -e
      sched:sched_switch`.
      
      When you analyze it with `perf script` and libunwind, you should see:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
                  7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
                  7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
                  7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
                  7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
                  7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
                  7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
                  7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
                  7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
                  7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
                  7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
                  7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
                  563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
                  563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
                  563b9cb506fb double std::__invoke_impl<double, double (*)()>(std::__invoke_other, double (*&&)())+0x2b (inlined)
                  563b9cb506fb std::__invoke_result<double (*)()>::type std::__invoke<double (*)()>(double (*&&)())+0x2b (inlined)
                  563b9cb506fb decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<double (*)()> >::_M_invoke<0ul>(std::_Index_tuple<0ul>)+0x2b (inlined)
                  563b9cb506fb std::thread::_Invoker<std::tuple<double (*)()> >::operator()()+0x2b (inlined)
                  563b9cb506fb std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<double>, std::__future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<double (*)()> >, dou>
                  563b9cb506fb std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_>
                  563b9cb507e8 std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>::operator()() const+0x28 (inlined)
                  563b9cb507e8 std::__future_base::_State_baseV2::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)+0x28 (/ssd/milian/>
                  7f38e46d24fe __pthread_once_slow+0xbe (/usr/lib/libpthread-2.28.so)
                  563b9cb51149 __gthread_once+0xe9 (inlined)
                  563b9cb51149 void std::call_once<void (std::__future_base::_State_baseV2::*)(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>*, bool*)>
                  563b9cb51149 std::__future_base::_State_baseV2::_M_set_result(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>, bool)+0xe9 (inlined)
                  563b9cb51149 std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >&&)::{lambda()#1}::op>
                  563b9cb51149 void std::__invoke_impl<void, std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double>
                  563b9cb51149 std::__invoke_result<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<double (*)()> >>
                  563b9cb51149 decltype (__invoke((_S_declval<0ul>)())) std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_>
                  563b9cb51149 std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread::_Invoker<std::tuple<dou>
                  563b9cb51149 std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::__future_base::_Async_state_impl<std::thread::_Invoker<std::tuple<double (*)()> >, double>::_Async_state_impl(std::thread>
                  7f38e45f0062 execute_native_thread_routine+0x12 (/usr/lib/libstdc++.so.6.0.25)
                  7f38e46caa9c start_thread+0xfc (/usr/lib/libpthread-2.28.so)
                  7f38e42ccb22 __GI___clone+0x42 (inlined)
      ```
      
      Before this patch, using libdwfl, you would see:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
              a041161e77950c5c [unknown] ([unknown])
      ```
      
      With this patch applied, we get a bit further in unwinding:
      
      ```
      cpp-locking 20038 [005] 54830.236589: sched:sched_switch: prev_comm=cpp-locking prev_pid=20038 prev_prio=120 prev_state=T ==> next_comm=swapper/5 next_pid=0 next_prio=120
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb166fec5 __sched_text_start+0x545 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1670208 schedule+0x28 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb16737cc rwsem_down_read_failed+0xec (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1665e04 call_rwsem_down_read_failed+0x14 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb1672a03 down_read+0x13 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb106bd85 __do_page_fault+0x445 (/lib/modules/4.14.78-1-lts/build/vmlinux)
              ffffffffb18015f5 page_fault+0x45 (/lib/modules/4.14.78-1-lts/build/vmlinux)
                  7f38e4252591 new_heap+0x101 (/usr/lib/libc-2.28.so)
                  7f38e4252d0b arena_get2.part.4+0x2fb (/usr/lib/libc-2.28.so)
                  7f38e4255b1c tcache_init.part.6+0xec (/usr/lib/libc-2.28.so)
                  7f38e42569e5 __GI___libc_malloc+0x115 (inlined)
                  7f38e4241790 __GI__IO_file_doallocate+0x90 (inlined)
                  7f38e424fbbf __GI__IO_doallocbuf+0x4f (inlined)
                  7f38e424ee47 __GI__IO_file_overflow+0x197 (inlined)
                  7f38e424df36 _IO_new_file_xsputn+0x116 (inlined)
                  7f38e4242bfb __GI__IO_fwrite+0xdb (inlined)
                  7f38e463fa6d std::basic_streambuf<char, std::char_traits<char> >::sputn(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> >::_M_put(char const*, long)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::__write<char>(std::ostreambuf_iterator<char, std::char_traits<char> >, char const*, int)+0x1cd (inlined)
                  7f38e463fa6d std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_float<double>(std::ostreambuf_iterator<c>
                  7f38e464bd70 std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::put(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, double) const+0x90 (inl>
                  7f38e464bd70 std::ostream& std::ostream::_M_insert<double>(double)+0x90 (/usr/lib/libstdc++.so.6.0.25)
                  563b9cb502f7 std::ostream::operator<<(double)+0xb7 (inlined)
                  563b9cb502f7 worker()+0xb7 (/ssd/milian/projects/kdab/rnd/hotspot/build/tests/test-clients/cpp-locking/cpp-locking)
              6eab825c1ee3e4ff [unknown] ([unknown])
      ```
      
      Note that the backtrace is still stopping too early, when compared to
      the nice results obtained via libunwind. It's unclear so far what the
      reason for that is.
      
      Committer note:
      
      Further comment by Milian on the thread started on the Link: tag below:
      
       ---
      The remaining issue is due to a bug in elfutils:
      
      https://sourceware.org/ml/elfutils-devel/2018-q4/msg00089.html
      
      With both patches applied, libunwind and elfutils produce the same output for
      the above scenario.
       ---
      Signed-off-by: NMilian Wolff <milian.wolff@kdab.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Link: http://lkml.kernel.org/r/20181029141644.3907-1-milian.wolff@kdab.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      1fe627da
  8. 25 10月, 2018 2 次提交
    • A
      perf script: Implement --graph-function · 99f753f0
      Andi Kleen 提交于
      Add a ftrace style --graph-function argument to 'perf script' that
      allows to print itrace function calls only below a given function. This
      makes it easier to find the code of interest in a large trace.
      
      % perf record -e intel_pt//k -a sleep 1
      % perf script --graph-function group_sched_in --call-trace
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])          group_sched_in
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              event_sched_in.isra.107
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_event_set_state.part.71
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_time
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_disable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_log_itrace_start
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                      perf_event_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          calc_timer_values
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_cpu
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                          arch_perf_update_userpage
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              __fentry__
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              using_native_sched_clock
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                              sched_clock_stable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])                  perf_pmu_enable
                  perf   900 [000] 194167.205652203: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])          group_sched_in
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])              event_sched_in.isra.107
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_event_set_state.part.71
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_time
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_pmu_disable
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  perf_log_itrace_start
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                  __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                      perf_event_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          calc_timer_values
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_cpu
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          __x86_indirect_thunk_rax
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                          arch_perf_update_userpage
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              __fentry__
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              using_native_sched_clock
               swapper     0 [001] 194167.205660693: ([kernel.kallsyms])                              sched_clock_stable
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Tested-by: NLeo Yan <leo.yan@linaro.org>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Link: http://lkml.kernel.org/r/20180920180540.14039-5-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      99f753f0
    • A
      perf script: Make itrace script default to all calls · 4eb06815
      Andi Kleen 提交于
      By default 'perf script' for itrace outputs sampled instructions or
      branches. In my experience this is confusing to users because it's hard
      to correlate with real program behavior. The sampling makes sense for
      tools like 'perf report' that actually sample to reduce the run time,
      but run time is normally not a problem for 'perf script'.  It's better
      to give an accurate representation of the program flow.
      
      Default 'perf script' to output all calls for itrace. That's a much saner
      default. The old behavior can be still requested with 'perf script'
      --itrace=ibxwpe100000
      
      v2: Fix ETM build failure
      v3: Really fix ETM build failure (Kim Phillips)
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Kim Phillips <kim.phillips@arm.com>
      Cc: Leo Yan <leo.yan@linaro.org>
      Link: http://lkml.kernel.org/r/20180920180540.14039-3-andi@firstfloor.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4eb06815
  9. 23 10月, 2018 1 次提交
    • A
      perf trace: Introduce per-event maximum number of events property · a9c5e6c1
      Arnaldo Carvalho de Melo 提交于
      Call it 'nr', as in this context it should be expressive enough, i.e.:
      
        # perf trace -e sched:*waking/nr=8,call-graph=fp/
           0.000 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001
                                             try_to_wake_up ([kernel.kallsyms])
                                             sched_clock ([kernel.kallsyms])
           3.933 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=001
                                             try_to_wake_up ([kernel.kallsyms])
                                             sched_clock ([kernel.kallsyms])
           3.970 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          20.069 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          37.170 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          53.267 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          70.365 IPDL Backgroun/3622 sched:sched_waking:comm=Gecko_IOThread pid=3569 prio=120 target_cpu=003
                                             try_to_wake_up ([kernel.kallsyms])
                                             __libc_write (/usr/lib64/libpthread-2.26.so)
          75.781 Web Content/3649 sched:sched_waking:comm=JS Helper pid=3670 prio=120 target_cpu=000
                                             try_to_wake_up ([kernel.kallsyms])
                                             try_to_wake_up ([kernel.kallsyms])
                                             wake_up_q ([kernel.kallsyms])
                                             futex_wake ([kernel.kallsyms])
                                             do_futex ([kernel.kallsyms])
                                             __x64_sys_futex ([kernel.kallsyms])
                                             do_syscall_64 ([kernel.kallsyms])
                                             entry_SYSCALL_64_after_hwframe ([kernel.kallsyms])
                                             pthread_cond_signal@@GLIBC_2.3.2 (/usr/lib64/libpthread-2.26.so)
        #
      
        # perf trace -e sched:*switch/nr=2/,block:*_plug/nr=4/,block:*_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/
           0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> trace:3367 [120]
           0.046 :0/0 sched:sched_switch:swapper/1:0 [120] S ==> kworker/u16:58:2722 [120]
         570.670 irq/50-iwlwifi/680 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff93498051ef00 len=66
                                             __dev_queue_xmit ([kernel.kallsyms])
        1106.141 jbd2/dm-0-8/476 block:block_plug:[jbd2/dm-0-8]
        1106.175 jbd2/dm-0-8/476 block:block_unplug:[jbd2/dm-0-8] 1
        1618.088 kworker/u16:30/2694 block:block_plug:[kworker/u16:30]
        1810.000 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051ef00 len=52
                                             __dev_queue_xmit ([kernel.kallsyms])
        3857.974 :0/0 net:net_dev_queue:dev=vnet0 skbaddr=0xffff93498051f900 len=52
                                             __dev_queue_xmit ([kernel.kallsyms])
        4790.277 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8]
        4790.448 jbd2/dm-2-8/748 block:block_plug:[jbd2/dm-2-8]
        #
      
      The global --max-events has precendence:
      
        # trace --max-events 3 -e sched:*switch/nr=2/,block:*_plug/nr=4/,block:*_unplug/nr=1/,net:*dev_queue/nr=3,max-stack=16/
           0.000 :0/0 sched:sched_switch:swapper/0:0 [120] S ==> qemu-system-x86:2252 [120]
           0.029 qemu-system-x8/2252 sched:sched_switch:qemu-system-x86:2252 [120] D ==> swapper/0:0 [120]
          58.047 DNS Res~er #14/31661 net:net_dev_queue:dev=wlp3s0 skbaddr=0xffff9346966af100 len=84
                                             __dev_queue_xmit ([kernel.kallsyms])
                                             __libc_send (/usr/lib64/libpthread-2.26.so)
        #
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-s4jswltvh660ughvg9nwngah@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a9c5e6c1
  10. 22 10月, 2018 1 次提交
  11. 20 10月, 2018 2 次提交
    • D
      tools, perf: add and use optimized ring_buffer_{read_head, write_tail} helpers · 09d62154
      Daniel Borkmann 提交于
      Currently, on x86-64, perf uses LFENCE and MFENCE (rmb() and mb(),
      respectively) when processing events from the perf ring buffer which
      is unnecessarily expensive as we can do more lightweight in particular
      given this is critical fast-path in perf.
      
      According to Peter rmb()/mb() were added back then via a94d342b
      ("tools/perf: Add required memory barriers") at a time where kernel
      still supported chips that needed it, but nowadays support for these
      has been ditched completely, therefore we can fix them up as well.
      
      While for x86-64, replacing rmb() and mb() with smp_*() variants would
      result in just a compiler barrier for the former and LOCK + ADD for
      the latter (__sync_synchronize() uses slower MFENCE by the way), Peter
      suggested we can use smp_{load_acquire,store_release}() instead for
      architectures where its implementation doesn't resolve in slower smp_mb().
      Thus, e.g. in x86-64 we would be able to avoid CPU barrier entirely due
      to TSO. For architectures where the latter needs to use smp_mb() e.g.
      on arm, we stick to cheaper smp_rmb() variant for fetching the head.
      
      This work adds helpers ring_buffer_read_head() and ring_buffer_write_tail()
      for tools infrastructure that either switches to smp_load_acquire() for
      architectures where it is cheaper or uses READ_ONCE() + smp_rmb() barrier
      for those where it's not in order to fetch the data_head from the perf
      control page, and it uses smp_store_release() to write the data_tail.
      Latter is smp_mb() + WRITE_ONCE() combination or a cheaper variant if
      architecture allows for it. Those that rely on smp_rmb() and smp_mb() can
      further improve performance in a follow up step by implementing the two
      under tools/arch/*/include/asm/barrier.h such that they don't have to
      fallback to rmb() and mb() in tools/include/asm/barrier.h.
      
      Switch perf to use ring_buffer_read_head() and ring_buffer_write_tail()
      so it can make use of the optimizations. Later, we convert libbpf as
      well to use the same helpers.
      
      Side note [0]: the topic has been raised of whether one could simply use
      the C11 gcc builtins [1] for the smp_load_acquire() and smp_store_release()
      instead:
      
        __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
        __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
      
      Kernel and (presumably) tooling shipped along with the kernel has a
      minimum requirement of being able to build with gcc-4.6 and the latter
      does not have C11 builtins. While generally the C11 memory models don't
      align with the kernel's, the C11 load-acquire and store-release alone
      /could/ suffice, however. Issue is that this is implementation dependent
      on how the load-acquire and store-release is done by the compiler and
      the mapping of supported compilers must align to be compatible with the
      kernel's implementation, and thus needs to be verified/tracked on a
      case by case basis whether they match (unless an architecture uses them
      also from kernel side). The implementations for smp_load_acquire() and
      smp_store_release() in this patch have been adapted from the kernel side
      ones to have a concrete and compatible mapping in place.
      
        [0] http://patchwork.ozlabs.org/patch/985422/
        [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      09d62154
    • A
      perf evsel: Introduce per event max_events property · 2fda5ada
      Arnaldo Carvalho de Melo 提交于
      This simply adds the field to 'struct perf_evsel' and allows setting
      it via the event parser, to test it lets trace trace:
      
      First look at where in a function that receives an evsel we can put a probe
      to read how evsel->max_events was setup:
      
        # perf probe -x ~/bin/perf -L trace__event_handler
        <trace__event_handler@/home/acme/git/perf/tools/perf/builtin-trace.c:0>
              0  static int trace__event_handler(struct trace *trace, struct perf_evsel *evsel,
                                                union perf_event *event __maybe_unused,
                                                struct perf_sample *sample)
              3  {
              4         struct thread *thread = machine__findnew_thread(trace->host, sample->pid, sample->tid);
              5         int callchain_ret = 0;
      
              7         if (sample->callchain) {
              8                 callchain_ret = trace__resolve_callchain(trace, evsel, sample, &callchain_cursor);
              9                 if (callchain_ret == 0) {
             10                         if (callchain_cursor.nr < trace->min_stack)
             11                                 goto out;
             12                         callchain_ret = 1;
                                }
                        }
      
      See what variables we can probe at line 7:
      
        # perf probe -x ~/bin/perf -V trace__event_handler:7
        Available variables at trace__event_handler:7
                @<trace__event_handler+89>
                        int     callchain_ret
                        struct perf_evsel*      evsel
                        struct perf_sample*     sample
                        struct thread*  thread
                        struct trace*   trace
                        union perf_event*       event
      
      Add a probe at that line asking for evsel->max_events to be collected and named
      as "max_events":
      
        # perf probe -x ~/bin/perf trace__event_handler:7 'max_events=evsel->max_events'
        Added new event:
          probe_perf:trace__event_handler (on trace__event_handler:7 in /home/acme/bin/perf with max_events=evsel->max_events)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe_perf:trace__event_handler -aR sleep 1
      
      Now use 'perf trace', here aliased to just 'trace' and trace trace, i.e.
      the first 'trace' is tracing just that 'probe_perf:trace__event_handler' event,
      while the traced trace is tracing all scheduler tracepoints, will stop at two
      events (--max-events 2) and will just set evsel->max_events for all the sched
      tracepoints to 9, we will see the output of both traces intermixed:
      
        # trace -e *perf:*event_handler trace --max-events 2 -e sched:*/nr=9/
             0.000 :0/0 sched:sched_waking:comm=rcu_sched pid=10 prio=120 target_cpu=000
             0.009 :0/0 sched:sched_wakeup:comm=rcu_sched pid=10 prio=120 target_cpu=000
             0.000 trace/23949 probe_perf:trace__event_handler:(48c34a) max_events=0x9
             0.046 trace/23949 probe_perf:trace__event_handler:(48c34a) max_events=0x9
        #
      
      Now, if the traced trace sends its output to /dev/null, we'll see just
      what the first level trace outputs: that evsel->max_events is indeed
      being set to 9:
      
        # trace -e *perf:*event_handler trace -o /dev/null --max-events 2 -e sched:*/nr=9/
             0.000 trace/23961 probe_perf:trace__event_handler:(48c34a) max_events=0x9
             0.030 trace/23961 probe_perf:trace__event_handler:(48c34a) max_events=0x9
        #
      
      Now that we can set evsel->max_events, we can go to the next step, honour that
      per-event property in 'perf trace'.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: https://lkml.kernel.org/n/tip-og00yasj276joem6e14l1eas@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      2fda5ada
  12. 19 10月, 2018 1 次提交
  13. 18 10月, 2018 2 次提交