1. 18 9月, 2017 1 次提交
    • K
      perf machine: Use hashtable for machine threads · 91e467bc
      Kan Liang 提交于
      To process any events, it needs to find the thread in the machine first.
      The machine maintains a rb tree to store all threads. The rb tree is
      protected by a rw lock.
      
      It is not a problem for current perf which serially processing events.
      However, it will have scalability performance issue to process events in
      parallel, especially on a heavy load system which have many threads.
      
      Introduce a hashtable to divide the big rb tree into many samll rb tree
      for threads. The index is thread id % hashtable size. It can reduce the
      lock contention.
      
      Committer notes:
      
      Renamed some variables and function names to reduce semantic confusion:
      
        'struct threads' pointers: thread -> threads
        threads hastable index: tid -> hash_bucket
        struct threads *machine__thread() -> machine__threads()
        Cast tid to (unsigned int) to handle -1 in machine__threads() (Kan Liang)
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Lukasz Odzioba <lukasz.odzioba@intel.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1505096603-215017-2-git-send-email-kan.liang@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91e467bc
  2. 02 9月, 2017 1 次提交
  3. 30 8月, 2017 1 次提交
    • J
      perf report: Calculate the average cycles of iterations · c4ee0625
      Jin Yao 提交于
      The branch history code has a loop detection function. With this, we can
      get the number of iterations by calculating the removed loops.
      
      While it would be nice for knowing the average cycles of iterations.
      This patch adds up the cycles in branch entries of removed loops and
      save the result to the next branch entry (e.g. branch entry A).
      
      Finally it will display the iteration number and average cycles at the
      "from" of branch entry A.
      
      For example:
      perf record -g -j any,save_type ./div
      perf report --branch-history --no-children --stdio
      
      --22.63%--main div.c:42 (RET CROSS_2M)
                compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
                |
                 --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                           rand rand.c:28 (cycles:1)
                           rand rand.c:28 (RET CROSS_2M)
                           __random random.c:298 (cycles:1)
                           __random random.c:297 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (COND_BWD CROSS_2M)
                           __random random.c:295 (cycles:1)
                           __random random.c:295 (RET CROSS_2M)
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c4ee0625
  4. 12 8月, 2017 1 次提交
    • T
      perf record: Fix wrong size in perf_record_mmap for last kernel module · 9ad4652b
      Thomas Richter 提交于
      During work on perf report for s390 I ran into the following issue:
      
      0 0x318 [0x78]: PERF_RECORD_MMAP -1/0:
              [0x3ff804d6990(0xfffffc007fb2966f) @ 0]:
              x /lib/modules/4.12.0perf1+/kernel/drivers/s390/net/qeth_l2.ko
      
      This is a PERF_RECORD_MMAP entry of the perf.data file with an invalid
      module size for qeth_l2.ko (the s390 ethernet device driver).
      
      Even a mainframe does not have 0xfffffc007fb2966f bytes of main memory.
      
      It turned out that this wrong size is created by the perf record
      command.  What happens is this function call sequence from
      __cmd_record():
      
        perf_session__new():
          perf_session__create_kernel_maps():
            machine__create_kernel_maps():
              machine__create_modules():   Creates map for all loaded kernel modules.
                modules__parse():   Reads /proc/modules and extracts module name and
                                    load address (1st and last column)
                  machine__create_module():   Called for every module found in /proc/modules.
                                    Creates a new map for every module found and enters
                                    module name and start address into the map. Since the
                                    module end address is unknown it is set to zero.
      
      This ends up with a kernel module map list sorted by module start
      addresses.  All module end addresses are zero.
      
      Last machine__create_kernel_maps() calls function map_groups__fixup_end().
      This function iterates through the maps and assigns each map entry's
      end address the successor map entry start address. The last entry of the
      map group has no successor, so ~0 is used as end to consume the remaining
      memory.
      
      Later __cmd_record calls function record__synthesize() which in turn calls
      perf_event__synthesize_kernel_mmap() and perf_event__synthesize_modules()
      to create PERF_REPORT_MMAP entries into the perf.data file.
      
      On s390 this results in the last module qeth_l2.ko
      (which has highest start address, see module table:
              [root@s8360047 perf]# cat /proc/modules
              qeth_l2 86016 1 - Live 0x000003ff804d6000
              qeth 266240 1 qeth_l2, Live 0x000003ff80296000
              ccwgroup 24576 1 qeth, Live 0x000003ff80218000
              vmur 36864 0 - Live 0x000003ff80182000
              qdio 143360 2 qeth_l2,qeth, Live 0x000003ff80002000
              [root@s8360047 perf]# )
      to be the last entry and its map has an end address of ~0.
      
      When the PERF_RECORD_MMAP entry is created for kernel module qeth_l2.ko
      its start address and length is written. The length is calculated in line:
          event->mmap.len   = pos->end - pos->start;
      and results in 0xffffffffffffffff - 0x3ff804d6990(*) = 0xfffffc007fb2966f
      
      (*) On s390 the module start address is actually determined by a __weak function
      named arch__fix_module_text_start() in machine__create_module().
      
      I think this improvable. We can use the module size (2nd column of /proc/modules)
      to get each loaded kernel module size and calculate its end address.
      Only for map entries which do not have a valid end address (end is still zero)
      we can use the heuristic we have now, that is use successor start address or ~0.
      Signed-off-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Reviewed-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Thomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      LPU-Reference: 20170803134902.47207-2-tmricht@linux.vnet.ibm.com
      Link: http://lkml.kernel.org/n/tip-nmoqij5b5vxx7rq2ckwu8iaj@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9ad4652b
  5. 26 7月, 2017 1 次提交
    • J
      perf report: Make --branch-history work without callgraphs(-g) option in perf record · b49a821e
      Jin Yao 提交于
        perf record -b -g <command>
        perf report --branch-history
      
      This merges the LBRs with the callgraphs.
      
      However it would be nice if it also works without callgraphs (-g) set in
      perf record, so that only the LBRs are displayed.  But currently perf
      report errors in this case. For example,
      
        perf record -b <command>
        perf report --branch-history
      
        Error:
        Selected -g or --branch-history but no callchain data. Did
        you call 'perf record' without -g?
      
      This patch displays the LBRs only even if callgraphs(-g) is not enabled
      in perf record.
      
      Change log:
      
      v2: According to Milian Wolff's comment, change the obsolete error
      message. Now the error message is:
      
                       ┌─Error:─────────────────────────────────────┐
                       │Selected -g or --branch-history.            │
                       │But no callchain or branch data.            │
                       │Did you call 'perf record' without -g or -b?│
                       │                                            │
                       │                                            │
                       │Press any key...                            │
                       └────────────────────────────────────────────┘
      
      When passing the last parameter to hists__fprintf,
      changes "|" to "||".
      
        hists__fprintf(hists, !quiet, 0, 0, rep->min_percent, stdout,
                       symbol_conf.use_callchain || symbol_conf.show_branchflag_count);
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1494240182-28899-1-git-send-email-yao.jin@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b49a821e
  6. 19 7月, 2017 3 次提交
  7. 12 7月, 2017 1 次提交
    • A
      perf symbols: Accept zero as the kernel base address · 4b1303d0
      Arnaldo Carvalho de Melo 提交于
      Which is the case in S/390, where symbols were not being resolved
      because machine__get_kernel_start was only setting machine->kernel_start
      when the just successfully loaded kernel symtab had its map->start set
      to !0, when it was left at (1ULL << 63) assuming a partitioning of the
      address space for user/kernel, which is not the case in S/390 nor in
      Sparc.
      
      So just check if map__load() was successfull and set
      machine->kernel_start to zero, fixing kernel symbol resolution on S/390.
      
      Test performed by Thomas:
      
       ----
      
        I like this patch. I have done a new build and removed all my debug output to start
        from scratch. Without your patch I get this:
      
        # Samples: 4  of event 'cpu-clock'
        # Event count (approx.): 1000000
        #
        # Children      Self  Command  Shared Object     Symbol
        # ........  ........  .......  ................  ........................
            75.00%     0.00%  true     [unknown]         [k] 0x00000000004bedda
                    |
                    ---0x4bedda
                       |
                       |--50.00%--0x42693a
                       |          |
                       |           --25.00%--0x2a72e0
                       |                     0x2af0ca
                       |                     0x3d1003fe4c0
                       |
                        --25.00%--0x4272bc
                                  0x26fa84
      
        and with your patch (I just rebuilt the perf tool, nothing else and used the same
        perf.data file as input):
      
        # Samples: 4  of event 'cpu-clock'
        # Event count (approx.): 1000000
        #
        # Children      Self  Command  Shared Object               Symbol
        # ........  ........  .......  ..........................  ..................................
            75.00%     0.00%  true     [kernel.vmlinux]            [k] pgm_check_handler
                    |
                    ---pgm_check_handler
                       do_dat_exception
                       handle_mm_fault
                       __handle_mm_fault
                       filemap_map_pages
                       |
                       |--25.00%--rcu_read_lock_held
                       |          rcu_lockdep_current_cpu_online
                       |          0x3d1003ff4c0
                       |
                        --25.00%--lock_release
      
        Looks good to me....
       ----
      Reported-and-Tested-by: NThomas-Mich Richter <tmricht@linux.vnet.ibm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zvonko Kosic <zvonko.kosic@de.ibm.com>
      Link: http://lkml.kernel.org/n/tip-dk0n1uzmbe0tbthrpfqlx6bz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      4b1303d0
  8. 26 6月, 2017 1 次提交
    • J
      perf machine: Fix segfault for kernel.kptr_restrict=2 · 3f938ee2
      Jiri Olsa 提交于
      Michael reported the segfault when kernel.kptr_restrict=2 is set.
      
        $ perf record ls
        ...
        perf: Segmentation fault
        Obtained 16 stack frames.
        ./perf(dump_stack+0x2d) [0x5068df]
        ./perf(sighandler_dump_stack+0x2d) [0x5069bf]
        ./perf() [0x43e47b]
        /lib64/libc.so.6(+0x3594f) [0x7f762004794f]
        /lib64/libc.so.6(strlen+0x26) [0x7f762009ef86]
        /lib64/libc.so.6(__strdup+0xd) [0x7f762009ecbd]
        ./perf(maps__set_kallsyms_ref_reloc_sym+0x4d) [0x51590f]
        ./perf(machine__create_kernel_maps+0x136) [0x50a7de]
        ./perf(perf_session__create_kernel_maps+0x2c) [0x510a81]
        ./perf(perf_session__new+0x13d) [0x510e23]
        ./perf() [0x43fd61]
        ./perf(cmd_record+0x704) [0x441823]
        ./perf() [0x4bc1a0]
        ./perf() [0x4bc40d]
        ./perf() [0x4bc55f]
        ./perf(main+0x2d5) [0x4bc939]
        Segmentation fault (core dumped)
      
      The reason is that with kernel.kptr_restrict=2, we don't get
      the symbol from machine__get_running_kernel_start, which we
      want to use in maps__set_kallsyms_ref_reloc_sym and we crash.
      
      Check the symbol name value before calling
      maps__set_kallsyms_ref_reloc_sym() and succeed without ref_reloc_sym
      being set. It's safe because we check its existence before we use it.
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20170626095153.553-1-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3f938ee2
  9. 06 6月, 2017 1 次提交
  10. 03 5月, 2017 1 次提交
    • A
      perf symbols: Accept symbols starting at address 0 · b843f62a
      Arnaldo Carvalho de Melo 提交于
      That is the case of _text on s390, and we have some functions that return an
      address, using address zero to report problems, oops.
      
      This would lead the symbol loading routines to not use "_text" as the reference
      relocation symbol, or the first symbol for the kernel, but use instead
      "_stext", that is at the same address on x86_64 and others, but not on s390:
      
        [acme@localhost perf-4.11.0-rc6]$ head -15 /proc/kallsyms
        0000000000000000 T _text
        0000000000000418 t iplstart
        0000000000000800 T start
        000000000000080a t .base
        000000000000082e t .sk8x8
        0000000000000834 t .gotr
        0000000000000842 t .cmd
        0000000000000846 t .parm
        000000000000084a t .lowcase
        0000000000010000 T startup
        0000000000010010 T startup_kdump
        0000000000010214 t startup_kdump_relocated
        0000000000011000 T startup_continue
        00000000000112a0 T _ehead
        0000000000100000 T _stext
        [acme@localhost perf-4.11.0-rc6]$
      
      Which in turn would make 'perf test vmlinux' to fail because it wouldn't find
      the symbols before "_stext" in kallsyms.
      
      Fix it by using the return value only for errors and storing the
      address, when the symbol is successfully found, in a provided pointer
      arg.
      
      Before this patch:
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 40693
        Looking at the vmlinux_path (8 entries long)
        Using /usr/lib/debug/lib/modules/3.10.0-654.el7.s390x/vmlinux for symbols
        ERR : 0: _text not on kallsyms
        ERR : 0x418: iplstart not on kallsyms
        ERR : 0x800: start not on kallsyms
        ERR : 0x80a: .base not on kallsyms
        ERR : 0x82e: .sk8x8 not on kallsyms
        ERR : 0x834: .gotr not on kallsyms
        ERR : 0x842: .cmd not on kallsyms
        ERR : 0x846: .parm not on kallsyms
        ERR : 0x84a: .lowcase not on kallsyms
        ERR : 0x10000: startup not on kallsyms
        ERR : 0x10010: startup_kdump not on kallsyms
        ERR : 0x10214: startup_kdump_relocated not on kallsyms
        ERR : 0x11000: startup_continue not on kallsyms
        ERR : 0x112a0: _ehead not on kallsyms
        <SNIP warnings>
        test child finished with -1
        ---- end ----
        vmlinux symtab matches kallsyms: FAILED!
        [acme@localhost perf-4.11.0-rc6]$
      
      After:
      
        [acme@localhost perf-4.11.0-rc6]$ tools/perf/perf test -v 1
         1: vmlinux symtab matches kallsyms            :
        --- start ---
        test child forked, pid 47160
        <SNIP warnings>
        test child finished with 0
        ---- end ----
        vmlinux symtab matches kallsyms: Ok
        [acme@localhost perf-4.11.0-rc6]$
      Reported-by: NMichael Petlan <mpetlan@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-9x9bwgd3btwdk1u51xie93fz@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      b843f62a
  11. 25 4月, 2017 1 次提交
  12. 20 4月, 2017 5 次提交
  13. 14 3月, 2017 1 次提交
    • H
      perf tools: Add PERF_RECORD_NAMESPACES to include namespaces related info · f3b3614a
      Hari Bathini 提交于
      Introduce a new option to record PERF_RECORD_NAMESPACES events emitted
      by the kernel when fork, clone, setns or unshare are invoked. And update
      perf-record documentation with the new option to record namespace
      events.
      
      Committer notes:
      
      Combined it with a later patch to allow printing it via 'perf report -D'
      and be able to test the feature introduced in this patch. Had to move
      here also perf_ns__name(), that was introduced in another later patch.
      
      Also used PRIu64 and PRIx64 to fix the build in some enfironments wrt:
      
        util/event.c:1129:39: error: format '%lx' expects argument of type 'long unsigned int', but argument 6 has type 'long long unsigned int' [-Werror=format=]
           ret  += fprintf(fp, "%u/%s: %lu/0x%lx%s", idx
                                               ^
      Testing it:
      
        # perf record --namespaces -a
        ^C[ perf record: Woken up 1 times to write data ]
        [ perf record: Captured and wrote 1.083 MB perf.data (423 samples) ]
        #
        # perf report -D
        <SNIP>
        3 2028902078892 0x115140 [0xa0]: PERF_RECORD_NAMESPACES 14783/14783 - nr_namespaces: 7
                      [0/net: 3/0xf0000081, 1/uts: 3/0xeffffffe, 2/ipc: 3/0xefffffff, 3/pid: 3/0xeffffffc,
                       4/user: 3/0xeffffffd, 5/mnt: 3/0xf0000000, 6/cgroup: 3/0xeffffffb]
      
        0x1151e0 [0x30]: event: 9
        .
        . ... raw event: size 48 bytes
        .  0000:  09 00 00 00 02 00 30 00 c4 71 82 68 0c 7f 00 00  ......0..q.h....
        .  0010:  a9 39 00 00 a9 39 00 00 94 28 fe 63 d8 01 00 00  .9...9...(.c....
        .  0020:  03 00 00 00 00 00 00 00 ce c4 02 00 00 00 00 00  ................
        <SNIP>
              NAMESPACES events:          1
        <SNIP>
        #
      Signed-off-by: NHari Bathini <hbathini@linux.vnet.ibm.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Aravinda Prasad <aravinda@linux.vnet.ibm.com>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/148891930386.25309.18412039920746995488.stgit@hbathini.in.ibm.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f3b3614a
  14. 04 3月, 2017 1 次提交
  15. 15 2月, 2017 1 次提交
  16. 14 2月, 2017 1 次提交
  17. 12 1月, 2017 1 次提交
  18. 15 11月, 2016 1 次提交
    • J
      perf report: Add branch flag to callchain cursor node · 410024db
      Jin Yao 提交于
      Since the branch ip has been added to call stack for easier browsing,
      this patch adds more branch information. For example, add a flag to
      indicate if this ip is a branch, and also add with the branch flag.
      
      Then we can know if the cursor node represents a branch and know what
      the branch flag it has.
      
      The branch history code has a loop detection pass that removes loops. It
      would be nice for knowing how many loops were removed then in next
      steps, we can compute out the average number of iterations.
      
      For example:
      
      Before remove_loops(),
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x300, to = 0x250
      entry3: from = 0x300, to = 0x250
      entry4: from = 0x700, to = 0x800
      
      After remove_loops()
      entry0: from = 0x100, to = 0x200
      entry1: from = 0x300, to = 0x250
      entry2: from = 0x700, to = 0x800
      
      The original entry2 and entry3 are removed. So the number of iterations
      (from = 0x300, to = 0x250) is equal to removed number + 1 (2 + 1).
      
      iterations = removed number + 1;
      average iteractions = Sum(iteractions) / number of samples
      
      This formula ignores other cases, for example, iterations cross multiple
      buffers and one buffer contains 2+ loops. Because in practice, it's good
      enough.
      Signed-off-by: NYao Jin <yao.jin@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linux-kernel@vger.kernel.org
      Cc: Yao Jin <yao.jin@linux.intel.com>
      Link: http://lkml.kernel.org/n/1477876794-30749-2-git-send-email-yao.jin@linux.intel.com
      [ Renamed 'iter' to 'nr_loop_iter' for clarity ]
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      410024db
  19. 03 10月, 2016 1 次提交
    • A
      perf tools: Experiment with cppcheck · 18ef15c6
      Arnaldo Carvalho de Melo 提交于
      Experimenting a bit using cppcheck[1], a static checker brought to my
      attention by Colin, reducing the scope of some variables, reducing the
      line of source code lines in the process:
      
        $ cppcheck --enable=style tools/perf/util/thread.c
        Checking tools/perf/util/thread.c...
        [tools/perf/util/thread.c:17]: (style) The scope of the variable 'leader' can be reduced.
        [tools/perf/util/thread.c:133]: (style) The scope of the variable 'err' can be reduced.
        [tools/perf/util/thread.c:273]: (style) The scope of the variable 'err' can be reduced.
      
      Will continue later, but these are already useful, keep them.
      
      1: https://sourceforge.net/p/cppcheck/wiki/Home/
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ixws7lbycihhpmq9cc949ti6@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      18ef15c6
  20. 05 9月, 2016 2 次提交
  21. 27 7月, 2016 1 次提交
  22. 22 6月, 2016 1 次提交
  23. 07 6月, 2016 1 次提交
    • H
      perf unwind: Move unwind__prepare_access from thread_new into thread__insert_map · 8132a2a8
      He Kuang 提交于
      To determine the libunwind methods to use, we should get the
      32bit/64bit information from maps of a thread. When a thread is newly
      created, the information is not prepared. This patch moves
      unwind__prepare_access() into thread__insert_map() so we can get the
      information we need from maps. Meanwhile, let thread__insert_map()
      return value and show messages on error.
      Signed-off-by: NHe Kuang <hekuang@huawei.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/r/1464924803-22214-5-git-send-email-hekuang@huawei.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      8132a2a8
  24. 20 5月, 2016 3 次提交
  25. 17 5月, 2016 2 次提交
  26. 06 5月, 2016 3 次提交
    • C
      perf callchain: Fix incorrect ordering of entries · 9919a65e
      Chris Phlipot 提交于
      The existing implementation of thread__resolve_callchain, under certain
      circumstances, can assemble callchain entries in the incorrect order.
      
      The callchain entries are resolved incorrectly for a sample when all of
      the following conditions are met:
      
      1. callchain_param.order is set to ORDER_CALLER
      
      2. thread__resolve_callchain_sample is able to resolve callchain entries
         for the sample.
      
      3. unwind__get_entries is also able to resolve callchain entries for the
         sample.
      
      The fix is accomplished by reversing the order in which
      thread__resolve_callchain_sample and unwind__get_entries are called when
      callchain_param.order is set to ORDER_CALLER.
      
      Unwind specific code from thread__resolve_callchain is also moved into a
      new static function to improve readability of the fix.
      
      How to Reproduce the Existing Bug:
      
      Modifying perf script to print call trees in the opposite order or
      applying the remaining patches from this series and comparing the
      results output from export-to-postgtresql.py are the easiest ways to see
      the bug, however it can still be seen in current builds using perf
      report.
      
      Here is how i can reproduce the bug using perf report:
      
        # perf record --call-graph=dwarf stress -c 1 -t 5
      
      when i run this command:
      
        # perf report --call-graph=flat,0,0,callee
      
      This callchain, containing kernel (handle_irq_event, etc) and userspace
      samples (__libc_start_main, etc) is contained in the output, which looks
      correct (callee order):
      
                      gen8_irq_handler
                      handle_irq_event_percpu
                      handle_irq_event
                      handle_edge_irq
                      handle_irq
                      do_IRQ
                      ret_from_intr
                      __random
                      rand
                      0x558f2a04dded
                      0x558f2a04c774
                      __libc_start_main
                      0x558f2a04dcd9
      
      Now run this command using caller order:
      
        # perf report --call-graph=flat,0,0,caller
      
      It is expected to see the exact reverse of the above when using caller
      order (with "0x558f2a04dcd9" at the top and "gen8_irq_handler" at the
      bottom) in the output, but it is nowhere to be found.
      
      instead you see this:
      
                      ret_from_intr
                      do_IRQ
                      handle_irq
                      handle_edge_irq
                      handle_irq_event
                      handle_irq_event_percpu
                      gen8_irq_handler
                      0x558f2a04dcd9
                      __libc_start_main
                      0x558f2a04c774
                      0x558f2a04dded
                      rand
                      __random
      
      Notice how internally the kernel symbols are reversed and the user space
      symbols are reversed, but the kernel symbols still appear above the user
      space symbols.
      
      if this patch is applied and perf script is re-run, you will see the
      expected output (with "0x558f2a04dcd9" at the top and "gen8_irq_handler"
      at the bottom):
      
                      0x558f2a04dcd9
                      __libc_start_main
                      0x558f2a04c774
                      0x558f2a04dded
                      rand
                      __random
                      ret_from_intr
                      do_IRQ
                      handle_irq
                      handle_edge_irq
                      handle_irq_event
                      handle_irq_event_percpu
                      gen8_irq_handler
      Signed-off-by: NChris Phlipot <cphlipot0@gmail.com>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1461831551-12213-2-git-send-email-cphlipot0@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9919a65e
    • J
      perf hists: Move sort__has_parent into struct perf_hpp_list · de7e6a7c
      Jiri Olsa 提交于
      Now we have sort dimensions private for struct hists, we need to make
      dimension booleans hists specific as well.
      
      Moving sort__has_parent into struct perf_hpp_list.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1462276488-26683-3-git-send-email-jolsa@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      de7e6a7c
    • A
      perf machine: Introduce number of threads member · d2c11034
      Arnaldo Carvalho de Melo 提交于
      To be used, for instance, for pre-allocating an rb_tree array for
      sorting by other keys besides the current pid one.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-ja0ifkwue7ttjhbwijn6g6eu@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2c11034
  27. 27 4月, 2016 1 次提交
  28. 19 4月, 2016 1 次提交