1. 24 10月, 2017 2 次提交
  2. 03 10月, 2017 1 次提交
    • K
      perf top: Implement multithreading for perf_event__synthesize_threads · 340b47f5
      Kan Liang 提交于
      The proc files which is sorted with alphabetical order are evenly
      assigned to several synthesize threads to be processed in parallel.
      
      For 'perf top', the threads number hard code to online CPU number. The
      following patch will introduce an option to set it.
      
      For other perf tools, the thread number is 1. Because the process
      function is not ready for multithreading, e.g.
      process_synthesized_event.
      
      This patch series only support event synthesize multithreading for 'perf
      top'. For other tools, it can be done separately later.
      
      With multithread applied, the total processing time can get up to 1.56x
      speedup on Knights Mill for 'perf top'.
      
      For specific single event processing, the processing time could increase
      because of the lock contention. So proc_map_timeout may need to be
      increased. Otherwise some proc maps will be truncated.
      
      Based on my test, increasing the proc_map_timeout has small impact
      on the total processing time. The total processing time still get 1.49x
      speedup on Knights Mill after increasing the proc_map_timeout.
      The patch itself doesn't increase the proc_map_timeout.
      
      Doesn't need to implement multithreading for per task monitoring,
      perf_event__synthesize_thread_map. It doesn't have performance issue.
      
      Committer testing:
      
        # getconf _NPROCESSORS_ONLN
        4
        # perf trace --no-inherit -e clone -o /tmp/output perf top
        # tail -4 /tmp/bla
           0.124 ( 0.041 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eb3a8f30, parent_tidptr: 0x7fc3eb3a99d0, child_tidptr: 0x7fc3eb3a99d0, tls: 0x7fc3eb3a9700) = 9548 (perf)
           0.246 ( 0.023 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3eaba7f30, parent_tidptr: 0x7fc3eaba89d0, child_tidptr: 0x7fc3eaba89d0, tls: 0x7fc3eaba8700) = 9549 (perf)
           0.286 ( 0.019 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9550 (perf)
         246.540 ( 0.047 ms): clone(flags: VM|FS|FILES|SIGHAND|THREAD|SYSVSEM|SETTLS|PARENT_SETTID|CHILD_CLEARTID, child_stack: 0x7fc3ea3a6f30, parent_tidptr: 0x7fc3ea3a79d0, child_tidptr: 0x7fc3ea3a79d0, tls: 0x7fc3ea3a7700) = 9551 (perf)
        #
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1506696477-146932-4-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      340b47f5
  3. 22 9月, 2017 1 次提交
    • A
      perf tools: Provide mutex wrappers for pthreads rwlocks · 0a7c74ea
      Arnaldo Carvalho de Melo 提交于
      Andi reported a performance drop in single threaded perf tools such as
      'perf script' due to the growing number of locks being put in place to
      allow for multithreaded tools, so wrap the POSIX threads rwlock routines
      with the names used for such kinds of locks in the Linux kernel and then
      allow for tools to ask for those locks to be used or not.
      
      I.e. a tool may have a multithreaded phase and then switch to single
      threaded, like the upcoming patches for the synthesizing of
      PERF_RECORD_{FORK,MMAP,etc} for pre-existing processes to then switch to
      single threaded mode in 'perf top'.
      
      The init routines will not be conditional, this way starting as single
      threaded to then move to multi threaded mode should be possible.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/20170404161739.GH12903@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      0a7c74ea
  4. 18 9月, 2017 2 次提交
  5. 02 9月, 2017 1 次提交
  6. 30 8月, 2017 1 次提交
    • J
      perf report: Calculate the average cycles of iterations · c4ee0625
      Jin Yao 提交于
      The branch history code has a loop detection function. With this, we can
      get the number of iterations by calculating the removed loops.
      
      While it would be nice for knowing the average cycles of iterations.
      This patch adds up the cycles in branch entries of removed loops and
      save the result to the next branch entry (e.g. branch entry A).
      
      Finally it will display the iteration number and average cycles at the
      "from" of branch entry A.
      
      For example:
      perf record -g -j any,save_type ./div
      perf report --branch-history --no-children --stdio
      
      --22.63%--main div.c:42 (RET CROSS_2M)
                compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
                |
                 --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                           rand rand.c:28 (cycles:1)
                           rand rand.c:28 (RET CROSS_2M)
                           __random random.c:298 (cycles:1)
                           __random random.c:297 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (RET CROSS_2M)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4ee0625
  7. 12 8月, 2017 1 次提交
    • T
      perf record: Fix wrong size in perf_record_mmap for last kernel module · 9ad4652b
      Thomas Richter 提交于
      During work on perf report for s390 I ran into the following issue:
      
      0 0x318 [0x78]: PERF_RECORD_MMAP -1/0:
              [0x3ff804d6990(0xfffffc007fb2966f) @ 0]:
              x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko
      
      This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid
      module size for qeth_l2.ko (the s390 ethernet device driver).
      
      Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory.
      
      It turned out that this wrong size is created by the perf record
      command.  What happens is this function call sequence from
      __cmd_record():
      
        perf_session__new():
          perf_session__create_kernel_maps():
            machine__create_kernel_maps():
              machine__create_modules():   Creates map for all loaded kernel modules.
                modules__parse():   Reads /proc/modules and extracts module name and
                                    load address (1st and last column)
                  machine__create_module():   Called for every module found in /proc/modules.
                                    Creates a new map for every module found and enters
                                    module name and start address into the map. Since the
                                    module end address is unknown it is set to zero.
      
      This ends up with a kernel module map list sorted by module start
      addresses.  All module end addresses are zero.
      
      Last machine__create_kernel_maps() calls function map_groups__fixup_end().
      This function iterates through the maps and assigns each map entry's
      end address the successor map entry start address. The last entry of the
      map group has no successor, so ~0 is used as end to consume the remaining
      memory.
      
      Later __cmd_record calls function record__synthesize() which in turn calls
      perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules()
      to create PERF_REPORT_MMAP entries into the perf.data file.
      
      On s390 this results in the last module qeth_l2.ko
      (which has highest start address, see module table:
              [root@s8360047 perf]# cat /proc/modules
              qeth_l2 86016 1 - Live 0x000003ff804d6000
              qeth 266240 1 qeth_l2, Live 0x000003ff80296000
              ccwgroup 24576 1 qeth, Live 0x000003ff80218000
              vmur 36864 0 - Live 0x000003ff80182000
              qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000
              [root@s8360047 perf]# )
      to be the last entry and its map has an end address of ~0.
      
      When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko
      its start address and length is written. The length is calculated in line:
          event->mmap.len   = pos->end - pos->start;
      and results in 0xffffffffffffffff - 0x3ff804d6990(*) = 0xfffffc007fb2966f
      
      (*) On s390 the module start address is actually determined by a __weak function
      named arch__fix_module_text_start() in machine__create_module().
      
      I think this improvable. We can use the module size (2nd column of /proc/modules)
      to get each loaded kernel module size and calculate its end address.
      Only for map entries which do not have a valid end address (end is still zero)
      we can use the heuristic we have now, that is use successor start address or ~0.
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ad4652b
  8. 26 7月, 2017 1 次提交
    • J
      perf report: Make --branch-history work without callgraphs(-g) option in perf record · b49a821e
      Jin Yao 提交于
        perf record -b -g <command>
        perf report --branch-history
      
      This merges the LBRs with the callgraphs.
      
      However it would be nice if it also works without callgraphs (-g) set in
      perf record, so that only the LBRs are displayed.  But currently perf
      report errors in this case. For example,
      
        perf record -b <command>
        perf report --branch-history
      
        Error:
        Selected -g or --branch-history but no callchain data. Did
        you call 'perf record' without -g?
      
      This patch displays the LBRs only even if callgraphs(-g) is not enabled
      in perf record.
      
      Change log:
      
      v2: According to Milian Wolff's comment, change the obsolete error
      message. Now the error message is:
      
                       ┌─Error:─────────────────────────────────────┐
                       │Selected -g or --branch-history.            │
                       │But no callchain or branch data.            │
                       │Did you call 'perf record' without -g or -b?│
                       │                                            │
                       │                                            │
                       │Press any key...                            │
                       └────────────────────────────────────────────┘
      
      When passing the last parameter to hists__fprintf,
      changes "|" to "||".
      
        hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
                       symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b49a821e
  9. 19 7月, 2017 3 次提交
  10. 12 7月, 2017 1 次提交
    • A
      perf symbols: Accept zero as the kernel base address · 4b1303d0
      Arnaldo Carvalho de Melo 提交于
      Which is the case in S/390, where symbols were not being resolved
      because machine__get_kernel_start was only setting machine->kernel_start
      when the just successfully loaded kernel symtab had its map->start set
      to !0, when it was left at (1ULL << 63) assuming a partitioning of the
      address space for user/kernel, which is not the case in S/390 nor in
      Sparc.
      
      So just check if map__load() was successfull and set
      machine->kernel_start to zero, fixing kernel symbol resolution on S/390.
      
      Test performed by Thomas:
      
       ----
      
        I like this patch. I have done a new build and removed all my debug output to start
        from scratch. Without your patch I get this:
      
        # Samples: 4  of event 'cpu-clock'
        # Event count (approx.): 1000000
        #
        # Children      Self  Command  Shared Object     Symbol
        # ........  ........  .......  ................  ........................
            75.00%     0.00%  true     [unknown]         [k] 0x00000000004bedda
                    |
                    ---0x4bedda
                       |
                       |--50.00%--0x42693a
                       |          |
                       |           --25.00%--0x2a72e0
                       |                     0x2af0ca
                       |                     0x3d1003fe4c0
                       |
                        --25.00%--0x4272bc
                                  0x26fa84
      
        and with your patch (I just rebuilt the perf tool, nothing else and used the same
        perf.data file as input):
      
        # Samples: 4  of event 'cpu-clock'
        # Event count (approx.): 1000000
        #
        # Children      Self  Command  Shared Object               Symbol
        # ........  ........  .......  ..........................  ..................................
            75.00%     0.00%  true     [kernel.vmlinux]            [k] pgm_check_handler
                    |
                    ---pgm_check_handler
                       do_dat_exception
                       handle_mm_fault
                       __handle_mm_fault
                       filemap_map_pages
                       |
                       |--25.00%--rcu_read_lock_held
                       |          rcu_lockdep_current_cpu_online
                       |          0x3d1003ff4c0
                       |
                        --25.00%--lock_release
      
        Looks good to me....
       ----
      Reported-and-Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      Link: http://lkml.kernel.org/n/tip-dk0n1uzmbe0tbthrpfqlx6bz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4b1303d0
  11. 26 6月, 2017 1 次提交
    • J
      perf machine: Fix segfault for kernel.kptr_restrict=2 · 3f938ee2
      Jiri Olsa 提交于
      Michael reported the segfault when kernel.kptr_restrict=2 is set.
      
        $ perf record ls
        ...
        perf: Segmentation fault
        Obtained 16 stack frames.
        ./perf(dump_stack+0x2d) [0x5068df]
        ./perf(sighandler_dump_stack+0x2d) [0x5069bf]
        ./perf() [0x43e47b]
        /lib64/libc.so.6(+0x3594f) [0x7f762004794f]
        /lib64/libc.so.6(strlen+0x26) [0x7f762009ef86]
        /lib64/libc.so.6(__strdup+0xd) [0x7f762009ecbd]
        ./perf(maps__set_kallsyms_ref_reloc_sym+0x4d) [0x51590f]
        ./perf(machine__create_kernel_maps+0x136) [0x50a7de]
        ./perf(perf_session__create_kernel_maps+0x2c) [0x510a81]
        ./perf(perf_session__new+0x13d) [0x510e23]
        ./perf() [0x43fd61]
        ./perf(cmd_record+0x704) [0x441823]
        ./perf() [0x4bc1a0]
        ./perf() [0x4bc40d]
        ./perf() [0x4bc55f]
        ./perf(main+0x2d5) [0x4bc939]
        Segmentation fault (core dumped)
      
      The reason is that with kernel.kptr_restrict=2, we don't get
      the symbol from machine__get_running_kernel_start, which we
      want to use in maps__set_kallsyms_ref_reloc_sym and we crash.
      
      Check the symbol name value before calling
      maps__set_kallsyms_ref_reloc_sym() and succeed without ref_reloc_sym
      being set. It's safe because we check its existence before we use it.
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170626095153.553-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f938ee2
  12. 06 6月, 2017 1 次提交
  13. 03 5月, 2017 1 次提交
    • A
      perf symbols: Accept symbols starting at address 0 · b843f62a
      Arnaldo Carvalho de Melo 提交于
      That is the case of _text on s390, and we have some functions that return an
      address, using address zero to report problems, oops.
      
      This would lead the symbol loading routines to not use "_text" as the reference
      relocation symbol, or the first symbol for the kernel, but use instead
      "_stext", that is at the same address on x86_64 and others, but not on s390:
      
        [acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
        0000000000000000 T _text
        0000000000000418 t iplstart
        0000000000000800 T start
        000000000000080a t .base
        000000000000082e t .sk8x8
        0000000000000834 t .gotr
        0000000000000842 t .cmd
        0000000000000846 t .parm
        000000000000084a t .lowcase
        0000000000010000 T startup
        0000000000010010 T startup_kdump
        0000000000010214 t startup_kdump_relocated
        0000000000011000 T startup_continue
        00000000000112a0 T _ehead
        0000000000100000 T _stext
        [acme@localhost perf-4.11.0-rc6]$
      
      Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
      the symbols before "_stext" in kallsyms.
      
      Fix it by using the return value only for errors and storing the
      address, when the symbol is successfully found, in a provided pointer
      arg.
      
      Before this patch:
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 40693
        Looking at the vmlinux_path (8 entries long)
        Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
        ERR : 0: _text not on kallsyms
        ERR : 0x418: iplstart not on kallsyms
        ERR : 0x800: start not on kallsyms
        ERR : 0x80a: .base not on kallsyms
        ERR : 0x82e: .sk8x8 not on kallsyms
        ERR : 0x834: .gotr not on kallsyms
        ERR : 0x842: .cmd not on kallsyms
        ERR : 0x846: .parm not on kallsyms
        ERR : 0x84a: .lowcase not on kallsyms
        ERR : 0x10000: startup not on kallsyms
        ERR : 0x10010: startup_kdump not on kallsyms
        ERR : 0x10214: startup_kdump_relocated not on kallsyms
        ERR : 0x11000: startup_continue not on kallsyms
        ERR : 0x112a0: _ehead not on kallsyms
        <SNIP warnings>
        test child finished with -1
        ---- end ----
        vmlinux symtab matches kallsyms: FAILED!
        [acme@localhost perf-4.11.0-rc6]$
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 47160
        <SNIP warnings>
        test child finished with 0
        ---- end ----
        vmlinux symtab matches kallsyms: Ok
        [acme@localhost perf-4.11.0-rc6]$
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b843f62a
  14. 25 4月, 2017 1 次提交
  15. 20 4月, 2017 5 次提交
  16. 14 3月, 2017 1 次提交
    • H
      perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info · f3b3614a
      Hari Bathini 提交于
      Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
      by the kernel when fork, clone, setns or unshare are invoked. And update
      perf-record documentation with the new option to record namespace
      events.
      
      Committer notes:
      
      Combined it with a later patch to allow printing it via 'perf report -D'
      and be able to test the feature introduced in this patch. Had to move
      here also perf_ns__name(), that was introduced in another later patch.
      
      Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
      
        util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
           ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
                                               ^
      Testing it:
      
        # perf record --namespaces -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
        #
        # perf report -D
        <SNIP>
        3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
                      [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                       4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      
        0x1151e0 [0x30]: event: 9
        .
        . ... raw event: size 48 bytes
        .  0000:  09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00  ......0..q.h....
        .  0010:  a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00  .9...9...(.c....
        .  0020:  03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00  ................
        <SNIP>
              NAMESPACES events:          1
        <SNIP>
        #
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3b3614a
  17. 04 3月, 2017 1 次提交
  18. 15 2月, 2017 1 次提交
  19. 14 2月, 2017 1 次提交
  20. 12 1月, 2017 1 次提交
  21. 15 11月, 2016 1 次提交
    • J
      perf report: Add branch flag to callchain cursor node · 410024db
      Jin Yao 提交于
      Since the branch ip has been added to call stack for easier browsing,
      this patch adds more branch information. For example, add a flag to
      indicate if this ip is a branch, and also add with the branch flag.
      
      Then we can know if the cursor node represents a branch and know what
      the branch flag it has.
      
      The branch history code has a loop detection pass that removes loops. It
      would be nice for knowing how many loops were removed then in next
      steps, we can compute out the average number of iterations.
      
      For example:
      
      Before remove_loops(),
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x300, to = 0x250
      entry3: from = 0x300, to = 0x250
      entry4: from = 0x700, to = 0x800
      
      After remove_loops()
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x700, to = 0x800
      
      The original entry2 and entry3 are removed. So the number of iterations
      (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
      
      iterations = removed number + 1;
      average iteractions = Sum(iteractions) / number of samples
      
      This formula ignores other cases, for example, iterations cross multiple
      buffers and one buffer contains 2+ loops. Because in practice, it's good
      enough.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
      [ Renamed 'iter' to 'nr_loop_iter' for clarity ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      410024db
  22. 03 10月, 2016 1 次提交
    • A
      perf tools: Experiment with cppcheck · 18ef15c6
      Arnaldo Carvalho de Melo 提交于
      Experimenting a bit using cppcheck[1], a static checker brought to my
      attention by Colin, reducing the scope of some variables, reducing the
      line of source code lines in the process:
      
        $ cppcheck --enable=style tools/perf/util/thread.c
        Checking tools/perf/util/thread.c...
        [tools/perf/util/thread.c:17]: (style) The scope of the variable 'leader' can be reduced.
        [tools/perf/util/thread.c:133]: (style) The scope of the variable 'err' can be reduced.
        [tools/perf/util/thread.c:273]: (style) The scope of the variable 'err' can be reduced.
      
      Will continue later, but these are already useful, keep them.
      
      1: https://sourceforge.net/p/cppcheck/wiki/Home/
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ixws7lbycihhpmq9cc949ti6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      18ef15c6
  23. 05 9月, 2016 2 次提交
  24. 27 7月, 2016 1 次提交
  25. 22 6月, 2016 1 次提交
  26. 07 6月, 2016 1 次提交
    • H
      perf unwind: Move unwind__prepare_access from thread_new into thread__insert_map · 8132a2a8
      He Kuang 提交于
      To determine the libunwind methods to use, we should get the
      32bit/64bit information from maps of a thread. When a thread is newly
      created, the information is not prepared. This patch moves
      unwind__prepare_access() into thread__insert_map() so we can get the
      information we need from maps. Meanwhile, let thread__insert_map()
      return value and show messages on error.
      Signed-off-by: NHe Kuang <hekuang@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1464924803-22214-5-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8132a2a8
  27. 20 5月, 2016 3 次提交
  28. 17 5月, 2016 2 次提交