1. 28 2月, 2010 4 次提交
    • F
      x86/hw-breakpoints: Remove the name field · 3d083407
      Frederic Weisbecker 提交于
      Remove the name field from the arch_hw_breakpoint. We never deal
      with target symbols in the arch level, neither do we need to ever
      store it. It's a legacy for the previous version of the x86
      breakpoint backend.
      
      Let's remove it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: K.Prasad <prasad@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      3d083407
    • F
      perf: Remove pointless breakpoint union · dd8b1cf6
      Frederic Weisbecker 提交于
      Remove pointless union in the breakpoint field of hw_perf_event.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      dd8b1cf6
    • F
      perf lock: Drop the buffers multiplexing dependency · b67577df
      Frederic Weisbecker 提交于
      We need to deal with time ordered events to build a correct
      state machine of lock events. This is why we multiplex the lock
      events buffers. But the ordering is done from the kernel, on
      the tracing fast path, leading to high contention between cpus.
      
      Without multiplexing, the events appears in a weak order.
      If we have four events, each split per cpu, perf record will
      read the events buffers in the following order:
      
      [ CPU0 ev0, CPU0 ev1, CPU0 ev3, CPU0 ev4, CPU1 ev0, CPU1 ev0....]
      
      To handle a post processing reordering, we could just read and sort
      the whole in memory, but it just doesn't scale with high amounts
      of events: lock events can fill huge amounts in few times.
      
      Basically we need to sort in memory and find a "grace period"
      point when we know that a given slice of previously sorted events
      can be committed for post-processing, so that we can unload the
      memory usage step by step and keep a scalable sorting list.
      
      There is no strong rules about how to define such "grace period".
      What does this patch is:
      
      We define a FLUSH_PERIOD value that defines a grace period in
      seconds.
      We want to have a slice of events covering 2 * FLUSH_PERIOD in our
      sorted list.
      If FLUSH_PERIOD is big enough, it ensures every events that occured
      in the first half of the timeslice have all been buffered and there
      are none remaining and there won't be further to put inside this
      first timeslice. Then once we reach the 2 * FLUSH_PERIOD
      timeslice, we flush the first half to be gentle with the memory
      (the second half can still get new events in the middle, so wait
      another period to flush it)
      
      FLUSH_PERIOD is defined to 5 seconds. Say the first event started on
      time t0. We can safely assume that at the time we are processing
      events of t0 + 10 seconds, ther won't be anymore events to read
      from perf.data that occured between t0 and t0 + 5 seconds. Hence
      we can safely flush the first half.
      
      To point out funky bugs, we have a guardian that checks a new event
      timestamp is not below the last event's timestamp flushed and that
      displays a warning in this case.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      b67577df
    • H
      perf lock: Fix and add misc documentally things · 84c6f88f
      Hitoshi Mitake 提交于
      I've forgot to add 'perf lock' line to command-list.txt,
      so users of perf could not find perf lock when they type 'perf'.
      
      Fixing command-list.txt requires document
      (tools/perf/Documentation/perf-lock.txt).
      But perf lock is too much "under construction" to write a
      stable document, so this is something like pseudo document for now.
      
      And I wrote description of perf lock at help section of
      CONFIG_LOCK_STAT, this will navigate users of lock trace events.
      Signed-off-by: NHitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      LKML-Reference: <1265267295-8388-1-git-send-email-mitake@dcl.info.waseda.ac.jp>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      84c6f88f
  2. 27 2月, 2010 4 次提交
  3. 26 2月, 2010 9 次提交
    • D
      perf tools: Flush maps on COMM events · 4385d580
      David S. Miller 提交于
      Even though we don't register the counters until the child is right about
      to exec(), we're still going to get at least a few events while the
      fork()'d child is still executing 'perf' and in particular we're going to
      get the MMAP events.
      
      We can't distinguish the ones in the newly executed process because the
      PID will be the same.
      
      One way to solve this would be to have a PERF_RECORD_EXEC event, and when
      this is seen 'perf' can flush it's map cache.  We can't use
      PERF_RECORD_COMM since that's generated by other things, not just exec().
      
      Actually, thinking about it some more, using PERF_RECORD_COMM might be a
      good enough approximation.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267196914-16238-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4385d580
    • P
      perf_events, x86: Split PMU definitions into separate files · f22f54f4
      Peter Zijlstra 提交于
      Split amd,p6,intel into separate files so that we can easily deal with
      CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c
      relies on symbols from amd.c
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f22f54f4
    • A
      perf annotate: Handle samples not at objdump output addr boundaries · 48fb4fdd
      Arnaldo Carvalho de Melo 提交于
      Without this patch we get this for need_resched:
      
      [root@mica ~]# perf annotate need_resched
      
      ------------------------------------------------
       Percent |      Source code & Disassembly of vmlinux
      ------------------------------------------------
               :
               :
               :      Disassembly of section .text:
               :
               :      ffffffff810095ed <need_resched>:
               :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
               :      }
               :
               :      static inline int need_resched(void)
               :      {
          0.00 :      ffffffff810095ed:       55                      push   %rbp
               :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
          0.00 :      ffffffff810095ee:       be 03 00 00 00          mov    $0x3,%esi
               :
               :      static inline struct thread_info *current_thread_info(void)
               :      {
               :              struct thread_info *ti;
               :              ti = (void *)(percpu_read_stable(kernel_stack) +
          0.00 :      ffffffff810095f3:       65 48 8b 3c 25 48 b5    mov    %gs:0xb548,%rdi
          0.00 :      ffffffff810095fa:       00 00
               :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
               :      }
               :
               :      static inline int need_resched(void)
               :      {
          0.00 :      ffffffff810095fc:       48 89 e5                mov    %rsp,%rbp
               :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
          0.00 :      ffffffff810095ff:       48 81 ef d8 1f 00 00    sub    $0x1fd8,%rdi
          0.00 :      ffffffff81009606:       e8 9d ff ff ff          callq  ffffffff810095a8 <test_ti_thread_flag>
               :      }
          0.00 :      ffffffff8100960b:       c9                      leaveq
          0.00 :      ffffffff8100960c:       85 c0                   test   %eax,%eax
          0.00 :      ffffffff8100960e:       0f 95 c0                setne  %al
          0.00 :      ffffffff81009611:       0f b6 c0                movzbl %al,%eax
               :      Disassembly of section .vsyscall_0:
               :      Disassembly of section .vsyscall_fn:
               :      Disassembly of section .vsyscall_1:
               :      Disassembly of section .vsyscall_2:
               :      Disassembly of section .init.text:
               :      Disassembly of section .altinstr_replacement:
               :      Disassembly of section .exit.text:
      [root@mica ~]#
      
      But from the 'perf report' result we know that there are hits
      for need_resched on a 4 way machine mostly doing nothing, so
      after adding code to show what is in each hist offset and
      collapsing IP hits for what happens between objdump lines we
      get, for the same perf.data file:
      
      [root@mica ~]# perf annotate -v need_resched
      
      ------------------------------------------------
       Percent |      Source code & Disassembly of vmlinux
      ------------------------------------------------
               :
               :
               :      Disassembly of section .text:
               :
               :      ffffffff810095ed <need_resched>:
               :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
               :      }
               :
               :      static inline int need_resched(void)
               :      {
          0.00 :      ffffffff810095ed:       55                      push   %rbp
               :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
         52.78 :      ffffffff810095ee:       be 03 00 00 00          mov    $0x3,%esi
               :
               :      static inline struct thread_info *current_thread_info(void)
               :      {
               :              struct thread_info *ti;
               :              ti = (void *)(percpu_read_stable(kernel_stack) +
          0.00 :      ffffffff810095f3:       65 48 8b 3c 25 48 b5    mov    %gs:0xb548,%rdi
          0.00 :      ffffffff810095fa:       00 00
               :              return (state & TASK_INTERRUPTIBLE) || __fatal_signal_pending(p);
               :      }
               :
               :      static inline int need_resched(void)
               :      {
          0.00 :      ffffffff810095fc:       48 89 e5                mov    %rsp,%rbp
               :              return unlikely(test_thread_flag(TIF_NEED_RESCHED));
          9.72 :      ffffffff810095ff:       48 81 ef d8 1f 00 00    sub    $0x1fd8,%rdi
          0.00 :      ffffffff81009606:       e8 9d ff ff ff          callq  ffffffff810095a8 <test_ti_thread_flag>
               :      }
          0.00 :      ffffffff8100960b:       c9                      leaveq
          0.00 :      ffffffff8100960c:       85 c0                   test   %eax,%eax
         37.50 :      ffffffff8100960e:       0f 95 c0                setne  %al
          0.00 :      ffffffff81009611:       0f b6 c0                movzbl %al,%eax
               :      Disassembly of section .vsyscall_0:
               :      Disassembly of section .vsyscall_fn:
               :      Disassembly of section .vsyscall_1:
               :      Disassembly of section .vsyscall_2:
               :      Disassembly of section .init.text:
               :      Disassembly of section .altinstr_replacement:
               :      Disassembly of section .exit.text:
      [root@mica ~]#
      
      And now 'perf annotate -v', verbose mode, will show the hits per
      precise IP, so that one can make sense of the attribution to
      each objdumop line:
      
      [root@mica ~]# perf annotate -v need_resched
      Looking at the vmlinux_path (5 entries long)
      Using /lib/modules/2.6.33-rc8-tip-00784-g3471df5-dirty/build/vmlinux
      for symbols annotate_sym: filename=/lib/modules/2.6.33-rc8-tip-00784-g3471df5-dirty/build/vmlinux, sym=need_resched, start=0xffffffff810095ed, end=0xffffffff81009614
      
      ------------------------------------------------
       Percent |      Source code & Disassembly of vmlinux
      ------------------------------------------------
                      ffffffff810095f1: 152
                      ffffffff81009603: 28
                      ffffffff8100960f: 55
                      ffffffff81009610: 53
                                h->sum: 288
      <SNIP same annotation>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1267194194-15670-1-git-send-email-acme@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      48fb4fdd
    • P
      perf_events, x86: Remove superflous MSR writes · 6667661d
      Peter Zijlstra 提交于
      We re-program the event control register every time we reset the count,
      this appears to be superflous, hence remove it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6667661d
    • P
      perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in() · 6e37738a
      Peter Zijlstra 提交于
      Since the cpu argument to hw_perf_group_sched_in() is always
      smp_processor_id(), simplify the code a little by removing this argument
      and using the current cpu where needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1265890918.5396.3.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e37738a
    • S
      perf_events, x86: AMD event scheduling · 38331f62
      Stephane Eranian 提交于
      This patch adds correct AMD NorthBridge event scheduling.
      
      NB events are events measuring L3 cache, Hypertransport traffic. They are
      identified by an event code >= 0xe0. They measure events on the
      Northbride which is shared by all cores on a package. NB events are
      counted on a shared set of counters. When a NB event is programmed in a
      counter, the data actually comes from a shared counter. Thus, access to
      those counters needs to be synchronized.
      
      We implement the synchronization such that no two cores can be measuring
      NB events using the same counters. Thus, we maintain a per-NB allocation
      table. The available slot is propagated using the event_constraint
      structure.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703957.0702d00a.6bf2.7b7d@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38331f62
    • S
      perf_events: Add new start/stop PMU callbacks · d76a0812
      Stephane Eranian 提交于
      In certain situations, the kernel may need to stop and start the same
      event rapidly. The current PMU callbacks do not distinguish between stop
      and release (i.e., stop + free the resource). Thus, a counter may be
      released, then it will be immediately re-acquired. Event scheduling will
      again take place with no guarantee to assign the same counter. On some
      processors, this may event yield to failure to assign the event back due
      to competion between cores.
      
      This patch is adding a new pair of callback to stop and restart a counter
      without actually release the underlying counter resource. On stop, the
      counter is stopped, its values saved and that's it. On start, the value
      is reloaded and counter is restarted (on x86, actual restart is delayed
      until perf_enable()).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added fallback to ->enable/->disable for all other PMUs
        fixed x86_pmu_start() to call x86_pmu.enable()
        merged __x86_pmu_disable into x86_pmu_stop() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d76a0812
    • P
      perf_events: Report the MMAP pgoff value in bytes · 3a0304e9
      Peter Zijlstra 提交于
      DaveM reported that currently perf interprets the pgoff value reported by
      the MMAP events as a byte range, but the kernel reports it as a page
      offset.
      
      Since its broken (and unusable) anyway, change the kernel behaviour (ABI)
      to report bytes indeed, avoiding the need for userspace to deal with
      PAGE_SIZE things.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a0304e9
    • A
      perf annotate: Defer allocating sym_priv->hist array · 628ada0c
      Arnaldo Carvalho de Melo 提交于
      Because symbol->end is not fixed up at symbol_filter time, only
      after all symbols for a DSO are loaded, and that, for asm
      symbols, may be bogus, causing segfaults when hits happen in
      these symbols.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Reported-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org> # for .33.x. Does not apply cleanly, needs backport.
      LKML-Reference: <20100225155740.GB8553@ghostprotocols.net>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      628ada0c
  4. 25 2月, 2010 12 次提交
  5. 24 2月, 2010 11 次提交