1. 25 10月, 2016 2 次提交
    • D
      perf bench futex: Avoid worker cacheline bouncing · e2e1680f
      Davidlohr Bueso 提交于
      Sebastian noted that overhead for worker thread ops (throughput)
      accounting was producing 'perf' to appear in the profiles, consuming a
      non-trivial (i.e. 13%) amount of CPU.
      
      This is due to cacheline bouncing due to the increment of w->ops.
      
      We can easily fix this by just working on a local copy and updating the
      actual worker once done running, and ready to show the program summary.
      There is no danger of the worker being concurrent, so we can trust that
      no stale value is being seen by another thread.
      
      This also gets rid of the unnecessary cache alignment hack; its not
      worth it.
      Reported-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Acked-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Link: http://lkml.kernel.org/r/1477342613-9938-2-git-send-email-dave@stgolabs.netSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      e2e1680f
    • I
      Merge tag 'perf-core-for-mingo-20161024' of... · 76e2d261
      Ingo Molnar 提交于
      Merge tag 'perf-core-for-mingo-20161024' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
      
      New features:
      
      - Dynamicly change verbosity level by pressing 'V' in the 'perf top/report'
        hists TUI browser (Alexis Berlemont)
      
      - Implement 'perf trace --delay' in the same fashion as in 'perf record --delay',
        to skip sampling workload initialization events (Alexis Berlemont)
      
      - Make vendor named events case insensitive in 'perf list', i.e.
        'perf list LONGEST_LAT' works just the same as  'perf list longest_lat' (Andi Kleen)
      
      - Show instruction bytes and lenght in 'perf script' for Intel PT and BTS (Andi Kleen, Adrian Hunter)
      
         E.g:
      
          % perf record -e intel_pt// foo
          % perf script --itrace=i0ns -F ip,insn,insnlen
           ffffffff8101232f ilen: 5 insn: 0f 1f 44 00 00
           ffffffff81012334 ilen: 1 insn: 5b
           ffffffff81012335 ilen: 1 insn: 5d
           ffffffff81012336 ilen: 1 insn: c3
           ffffffff810123e3 ilen: 1 insn: 5b
           ffffffff810123e4 ilen: 2 insn: 41 5c
           ffffffff810123e6 ilen: 1 insn: 5d
           ffffffff810123e7 ilen: 1 insn: c3
           ffffffff810124a6 ilen: 2 insn: 31 c0
           ffffffff810124a8 ilen: 9 insn: 41 83 bc 24 a8 01 00 00 01
           ffffffff810124b1 ilen: 2 insn: 75 87
      
      - Allow enabling the perf_event_attr.branch_type attribute member: (Andi Kleen)
      
        perf record -e sched:sched_switch,cpu/cpu-cycles,branch_type=any/ ...
      
      - Add unwinding support for jitdump (Stefano Sanfilippo)
      
      Fixes:
      
      - Use raw_syscall:sys_enter timestamp in 'perf trace' (Arnaldo Carvalho de Melo)
      
      Infrastructure:
      
      - Allow jitdump to be built without libdwarf (Maciej Debski)
      
      - Sync x86's syscall table tools/ copy (Arnaldo Carvalho de Melo)
      
      - Fixes to avoid calling die() in library fuctions already propagating other
        errors (Arnaldo Carvalho de Melo)
      
      - Improvements to allow libtraceevent to be properly installed in distro
        packages (Jiri Olsa)
      
      - Removing coresight miscellaneous debug output (Mathieu Poirier)
      
      - Cache align the 'perf bench futex' worker struct (Sebastian Andrzej Siewior)
      
      Documentation:
      
      - Minor improvements on the documentation of event parameters (Andi Kleen)
      
      - Add jitdump format specification document (Stephane Eranian)
      
      Spelling fixes:
      
      - Fix typo "No enough" to "Not enough" (Alexander Alemayhu)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      76e2d261
  2. 24 10月, 2016 37 次提交
  3. 22 10月, 2016 1 次提交
    • I
      Merge tag 'perf-c2c-for-mingo-20161021' of... · e9c84892
      Ingo Molnar 提交于
      Merge tag 'perf-c2c-for-mingo-20161021' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
      
      Pull new 'perf c2c' tool from Arnaldo Carvalho de Melo:
      
      - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
      
        It allows you to track down cacheline contention. The tool is based
        on x86's load latency and precise store facility events provided by
        Intel CPUs.
      
        It was tested by Joe Mario and has proven to be useful, finding some
        cacheline contentions. Joe also wrote a blog about c2c tool with
        examples:
      
          https://joemario.github.io/blog/2016/09/01/c2c-blog/
      
        Excerpt of the content on this site:
      
        ---
          At a high level, “perf c2c” will show you:
      
          * The cachelines where false sharing was detected.
          * The readers and writers to those cachelines, and the offsets where those accesses occurred.
          * The pid, tid, instruction addr, function name, binary object name for those readers and writers.
          * The source file and line number for each reader and writer.
          * The average load latency for the loads to those cachelines.
          * Which numa nodes the samples a cacheline came from and which CPUs were involved.
      
          Using perf c2c is similar to using the Linux perf tool today.
          First collect data with “perf c2c record” Then generate a report output with “perf c2c report”
        ---
      
        There one finds extensive details on using the tool, with tips on
        reducing the volume of samples while still capturing enough to do
        its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e9c84892