1. 21 11月, 2009 4 次提交
  2. 16 11月, 2009 1 次提交
    • P
      perf_event: Optimize perf_output_lock() · 559fdc3c
      Peter Zijlstra 提交于
      The purpose of perf_output_{un,}lock() is to:
      
       1) avoid publishing incomplete data
          [ possible when publishing a head that is ahead of an entry
            that is still being written ]
      
       2) guarantee fwd progress
          [ a simple refcount on pending writers doesn't need to drop to
            0, making it so would end up implementing something like forced
            quiecent states of RCU ]
      
      To satisfy the above without undue complexity it serializes
      between CPUs, this means that a pending writer can only be the
      same cpu in a nested context, and since (under normal operation)
      a cpu always makes progress we're good -- if the head is only
      published when the bottom  most writer completes.
      
      Now we don't need to disable IRQs in order to serialize between
      CPUs, disabling preemption ought to be sufficient, esp since we
      already deal with nesting due to NMIs.
      
      This avoids potentially expensive (and needless) local IRQ
      disable/enable ops.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1258373161.26714.254.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559fdc3c
  3. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  4. 04 11月, 2009 2 次提交
    • F
      perf/core: Add a callback to perf events · 97eaf530
      Frederic Weisbecker 提交于
      A simple callback in a perf event can be used for multiple purposes.
      For example it is useful for triggered based events like hardware
      breakpoints that need a callback to dispatch a triggered breakpoint
      event.
      
      v2: Simplify a bit the callback attribution as suggested by Paul
          Mackerras
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mundt <lethal@linux-sh.org>
      97eaf530
    • A
      perf/core: Provide a kernel-internal interface to get to performance counters · fb0459d7
      Arjan van de Ven 提交于
      There are reasons for kernel code to ask for, and use, performance
      counters.
      For example, in CPU freq governors this tends to be a good idea, but
      there are other examples possible as well of course.
      
      This patch adds the needed bits to do enable this functionality; they
      have been tested in an experimental cpufreq driver that I'm working on,
      and the changes are all that I needed to access counters properly.
      
      [fweisbec@gmail.com: added pid to perf_event_create_kernel_counter so
      that we can profile a particular task too
      
      TODO: Have a better error reporting, don't just return NULL in fail
      case.]
      
      v2: Remove the wrong comment about the fact
          perf_event_create_kernel_counter must be called from a kernel
          thread.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Avi Kivity <avi@redhat.com>
      LKML-Reference: <20090925122556.2f8bd939@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      fb0459d7
  5. 28 10月, 2009 1 次提交
  6. 23 10月, 2009 2 次提交
    • S
      perf events: Don't generate events for the idle task when exclude_idle is set · 54f44076
      Soeren Sandmann 提交于
      Getting samples for the idle task is often not interesting, so
      don't generate them when exclude_idle is set for the event in
      question.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <ye8pr8fmlq7.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      54f44076
    • S
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time... · 721a669b
      Soeren Sandmann 提交于
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
      
      Make the hrtimer based events work for sysprof.
      
      Whenever a swevent is scheduled out, the hrtimer is canceled.
      When it is scheduled back in, the timer is restarted. This
      happens every scheduler tick, which means the timer never
      expired because it was getting repeatedly restarted over and
      over with the same period.
      
      To fix that, save the remaining time when disabling; when
      reenabling, use that saved time as the period instead of the
      user-specified sampling period.
      
      Also, move the starting and stopping of the hrtimers to helper
      functions instead of duplicating the code.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      721a669b
  7. 15 10月, 2009 1 次提交
  8. 14 10月, 2009 1 次提交
    • P
      perf_event: Adjust frequency and unthrottle for non-group-leader events · 03541f8b
      Paul Mackerras 提交于
      The loop in perf_ctx_adjust_freq checks the frequency of sampling
      event counters, and adjusts the event interval and unthrottles the
      event if required, and resets the interrupt count for the event.
      However, at present it only looks at group leaders.
      
      This means that a sampling event that is not a group leader will
      eventually get throttled, once its interrupt count reaches
      sysctl_perf_event_sample_rate/HZ --- and that is guaranteed to
      happen, if the event is active for long enough, since the interrupt
      count never gets reset.  Once it is throttled it never gets
      unthrottled, so it basically just stops working at that point.
      
      This fixes it by making perf_ctx_adjust_freq use ctx->event_list
      rather than ctx->group_list.  The existing spin_lock/spin_unlock
      around the loop makes it unnecessary to put rcu_read_lock/
      rcu_read_unlock around the list_for_each_entry_rcu().
      Reported-by: NMark W. Krentel <krentel@cs.rice.edu>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19157.26731.855609.165622@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      03541f8b
  9. 06 10月, 2009 1 次提交
    • P
      perf_event: Provide vmalloc() based mmap() backing · 906010b2
      Peter Zijlstra 提交于
      Some architectures such as Sparc, ARM and MIPS (basically
      everything with flush_dcache_page()) need to deal with dcache
      aliases by carefully placing pages in both kernel and user maps.
      
      These architectures typically have to use vmalloc_user() for this.
      
      However, on other architectures, vmalloc() is not needed and has
      the downsides of being more restricted and slower than regular
      allocations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1254830228.21044.272.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      906010b2
  10. 01 10月, 2009 2 次提交
  11. 28 9月, 2009 1 次提交
  12. 21 9月, 2009 4 次提交
    • I
      perf: Tidy up after the big rename · 57c0c15b
      Ingo Molnar 提交于
       - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
      
       - provide courtesy copy of old perf_counter.h, for user-space projects
      
       - small indentation fixups
      
       - fix up MAINTAINERS
      
       - fix small x86 printout fallout
      
       - fix up small PowerPC comment fallout (use 'counter' as in register)
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57c0c15b
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
    • I
      perf_counter: Rename 'event' to event_id/hw_event · dfc65094
      Ingo Molnar 提交于
      In preparation to the renames, to avoid a namespace clash.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dfc65094
    • I
      perf_counter: Rename list_entry -> group_entry, counter_list -> group_list · 65abc865
      Ingo Molnar 提交于
      This is in preparation of the big rename, but also makes sense
      in a standalone way: 'list_entry' is a bad name as we already
      have a list_entry() in list.h.
      
      Also, the 'counter list' is too vague, it doesnt tell us the
      purpose of that list.
      
      Clarify these names to show that it's all about the group
      hiearchy.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65abc865
  13. 20 9月, 2009 1 次提交
    • I
      perf_counter: Fix perf_copy_attr() pointer arithmetic · cdf8073d
      Ian Schram 提交于
      There is still some weird code in per_copy_attr(). Which supposedly
      checks that all bytes trailing a struct are zero.
      
      It doesn't seem to get pointer arithmetic right. Since it
      increments an iterating pointer by sizeof(unsigned long) rather
      than 1.
      Signed-off-by: NIan Schram <ischram@telenet.be>
      [ v2: clean up the messy PTR_ALIGN logic as well. ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: <stable@kernel.org> # for v2.6.31.x
      LKML-Reference: <4AB3DEE2.3030600@telenet.be>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdf8073d
  14. 19 9月, 2009 4 次提交
  15. 18 9月, 2009 2 次提交
  16. 15 9月, 2009 1 次提交
  17. 13 9月, 2009 1 次提交
    • I
      perf_counter: Allow mmap if paranoid checks are turned off · 459ec28a
      Ingo Molnar 提交于
      Before:
      
        $ perf sched record -f sleep 1
        Error: failed to mmap with 1 (Operation not permitted)
      
      After:
      
        $ perf sched record -f sleep 1
        [ perf record: Captured and wrote 0.095 MB perf.data (~4161 samples) ]
      
      Note, this is only allowed if perfcounter_paranoid is set to
      the most permissive (non-default) value of -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      459ec28a
  18. 04 9月, 2009 1 次提交
    • I
      perf_counter: Fix output-sharing error path · dc86cabe
      Ingo Molnar 提交于
      We forget to release the fd in the PERF_FLAG_FD_OUTPUT
      error path.
      
      Reorganize the error flow here to be a clean fall-through
      logic.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc86cabe
  19. 03 9月, 2009 1 次提交
  20. 29 8月, 2009 1 次提交
    • P
      perf_counter: Fix /0 bug in swcounters · eced1dfc
      Peter Zijlstra 提交于
      We have a race in the swcounter stuff where we can start
      counting a counter that has never been enabled, this leads to a
      /0 situation.
      
      The below avoids the /0 but doesn't close the race, this would
      need a new counter state.
      
      The race is due to perf_swcounter_is_counting() which cannot
      discern between disabled due to scheduled out, and disabled for
      any other reason.
      
      Such a crash has been seen by Ingo:
      
      [  967.092372] divide error: 0000 [#1] SMP
      [  967.096499] last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map
      [  967.104846] CPU 5
      [  967.106965] Modules linked in:
      [  967.110169] Pid: 3351, comm: hackbench Not tainted 2.6.31-rc8-tip-01158-gd940a54-dirty #1568 X8DTN
      [  967.119456] RIP: 0010:[<ffffffff810c0aba>]  [<ffffffff810c0aba>] perf_swcounter_ctx_event+0x127/0x1af
      [  967.129137] RSP: 0018:ffff8801a95abd70  EFLAGS: 00010046
      [  967.134699] RAX: 0000000000000002 RBX: ffff8801bd645c00 RCX: 0000000000000002
      [  967.142162] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8801bd645d40
      [  967.149584] RBP: ffff8801a95abdb0 R08: 0000000000000001 R09: ffff8801a95abe00
      [  967.157042] R10: 0000000000000037 R11: ffff8801aa1245f8 R12: ffff8801a95abe00
      [  967.164481] R13: ffff8801a95abe00 R14: ffff8801aa1c0e78 R15: 0000000000000001
      [  967.171953] FS:  0000000000000000(0000) GS:ffffc90000a00000(0063) knlGS:00000000f7f486c0
      [  967.180406] CS:  0010 DS: 002b ES: 002b CR0: 000000008005003b
      [  967.186374] CR2: 000000004822c0ac CR3: 00000001b19a2000 CR4: 00000000000006e0
      [  967.193770] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  967.201224] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [  967.208692] Process hackbench (pid: 3351, threadinfo ffff8801a95aa000, task ffff8801a96b0000)
      [  967.217607] Stack:
      [  967.219711]  0000000000000000 0000000000000037 0000000200000001 ffffc90000a1107c
      [  967.227296] <0> ffff8801a95abe00 0000000000000001 0000000000000001 0000000000000037
      [  967.235333] <0> ffff8801a95abdf0 ffffffff810c0c20 0000000200a14f30 ffff8801a95abe40
      [  967.243532] Call Trace:
      [  967.246103]  [<ffffffff810c0c20>] do_perf_swcounter_event+0xde/0xec
      [  967.252635]  [<ffffffff810c0ca7>] perf_tpcounter_event+0x79/0x7b
      [  967.258957]  [<ffffffff81037f73>] ftrace_profile_sched_switch+0xc0/0xcb
      [  967.265791]  [<ffffffff8155f22d>] schedule+0x429/0x4c4
      [  967.271156]  [<ffffffff8100c01e>] int_careful+0xd/0x14
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1251472247.17617.74.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eced1dfc
  21. 28 8月, 2009 1 次提交
    • I
      perf_counters: Increase paranoia level · 6bb56347
      Ingo Molnar 提交于
      Per-cpu counters are an ASLR information leak as they show
      the execution other tasks do. Increase the paranoia level
      to 1, which disallows per-cpu counters. (they still allow
      counting/profiling of own tasks - and admin can profile
      everything.)
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bb56347
  22. 25 8月, 2009 2 次提交
    • P
      perf_counter: Allow sharing of output channels · a4be7c27
      Peter Zijlstra 提交于
      Provide the ability to configure a counter to send its output
      to another (already existing) counter's output stream.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090819092023.980284148@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4be7c27
    • P
      perf_counter: Start counting time enabled when group leader gets enabled · fa289bec
      Paul Mackerras 提交于
      Currently, if a group is created where the group leader is
      initially disabled but a non-leader member is initially
      enabled, and then the leader is subsequently enabled some time
      later, the time_enabled for the non-leader member will reflect
      the whole time since it was created, not just the time since
      the leader was enabled.
      
      This is incorrect, because all of the members are effectively
      disabled while the leader is disabled, since none of the
      members can go on the PMU if the leader can't.
      
      Thus we have to update the ->tstamp_enabled for all the enabled
      group members when a group leader is enabled, so that the
      time_enabled computation only counts the time since the leader
      was enabled.
      
      Similarly, when disabling a group leader we have to update the
      time_enabled and time_running for all of the group members.
      
      Also, in update_counter_times, we have to treat a counter whose
      group leader is disabled as being disabled.
      Reported-by: NStephane Eranian <eranian@googlemail.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      LKML-Reference: <19091.29664.342227.445006@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa289bec
  23. 22 8月, 2009 1 次提交
    • P
      perf_counter: Fix typo in read() output generation · 4464fcaa
      Peter Zijlstra 提交于
      When you iterate a list, using the iterator is useful.
      
      Before:
      
         ID: 5
         ID: 5
         ID: 5
         ID: 5
         EVNT: 0x40088b scale: nan ID: 5 CNT: 1006252 ID: 6 CNT: 1011090 ID: 7 CNT: 1011196 ID: 8 CNT: 1011095
         EVNT: 0x40088c scale: 1.000000 ID: 5 CNT: 2003065 ID: 6 CNT: 2011671 ID: 7 CNT: 2012620 ID: 8 CNT: 2013479
         EVNT: 0x40088c scale: 1.000000 ID: 5 CNT: 3002390 ID: 6 CNT: 3015996 ID: 7 CNT: 3018019 ID: 8 CNT: 3020006
         EVNT: 0x40088b scale: 1.000000 ID: 5 CNT: 4002406 ID: 6 CNT: 4021120 ID: 7 CNT: 4024241 ID: 8 CNT: 4027059
      
      After:
      
         ID: 1
         ID: 2
         ID: 3
         ID: 4
         EVNT: 0x400889 scale: nan ID: 1 CNT: 1005270 ID: 2 CNT: 1009833 ID: 3 CNT: 1010065 ID: 4 CNT: 1010088
         EVNT: 0x400898 scale: nan ID: 1 CNT: 2001531 ID: 2 CNT: 2022309 ID: 3 CNT: 2022470 ID: 4 CNT: 2022627
         EVNT: 0x400888 scale: 0.489467 ID: 1 CNT: 3001261 ID: 2 CNT: 3027088 ID: 3 CNT: 3027941 ID: 4 CNT: 3028762
      Reported-by: Nstephane eranian <eranian@googlemail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey J Ashford <cjashfor@us.ibm.com>
      Cc: perfmon2-devel <perfmon2-devel@lists.sourceforge.net>
      LKML-Reference: <1250867976.7538.73.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4464fcaa
  24. 18 8月, 2009 1 次提交
    • I
      perf_counter: Fix the PARISC build · f738eb1b
      Ingo Molnar 提交于
      PARISC does not build:
      
      /home/mingo/tip/kernel/perf_counter.c: In function 'perf_counter_index':
      /home/mingo/tip/kernel/perf_counter.c:2016: error: 'PERF_COUNTER_INDEX_OFFSET' undeclared (first use in this function)
      /home/mingo/tip/kernel/perf_counter.c:2016: error: (Each undeclared identifier is reported only once
      /home/mingo/tip/kernel/perf_counter.c:2016: error: for each function it appears in.)
      
      As PERF_COUNTER_INDEX_OFFSET is not defined.
      
      Now, we could define it in the architecture - but lets also provide
      a core default of 0 (which happens to be what all but one
      architecture uses at the moment).
      
      Architectures that need a different index offset should set this
      value in their asm/perf_counter.h files.
      
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f738eb1b
  25. 17 8月, 2009 1 次提交
    • P
      perf_counter: Check task on counter read IPI · e1ac3614
      Paul Mackerras 提交于
      In general, code in perf_counter.c that is called through an
      IPI checks, for per-task counters, that the counter's task is
      still the current task.  This is to handle the race condition
      where the cpu switches from the task we want to another task in
      the interval between sending the IPI and the IPI arriving and
      being handled on the target CPU.
      
      For some reason, __perf_counter_read is missing this check, yet
      there is no reason why the race condition can't occur.  This
      adds a check that the current task is the one we want.  If it
      isn't, we just return.  In that case the counter->count value
      should be up to date, since it will have been updated when the
      counter was scheduled out, which must have happened since the
      IPI was sent.
      
      I don't have an example of an actual failure due to this race,
      but it seems obvious that it could occur and we need to guard
      against it.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19076.63614.277861.368125@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1ac3614
  26. 13 8月, 2009 1 次提交
    • P
      perf_counter: Report the cloning task as parent on perf_counter_fork() · 94d5d1b2
      Peter Zijlstra 提交于
      A bug in (9f498cc5: perf_counter: Full task tracing) makes
      profiling multi-threaded apps it go belly up.
      
      [ output as: (PID:TID):(PPID:PTID) ]
      
       # ./perf report -D | grep FORK
      0x4b0 [0x18]: PERF_EVENT_FORK: (3237:3237):(3236:3236)
      0xa10 [0x18]: PERF_EVENT_FORK: (3237:3238):(3236:3236)
      0xa70 [0x18]: PERF_EVENT_FORK: (3237:3239):(3236:3236)
      0xad0 [0x18]: PERF_EVENT_FORK: (3237:3240):(3236:3236)
      0xb18 [0x18]: PERF_EVENT_FORK: (3237:3241):(3236:3236)
      
      Shows us that the test (27d028de perf report: Update for the new
      FORK/EXIT events) in builtin-report.c:
      
              /*
               * A thread clone will have the same PID for both
               * parent and child.
               */
              if (thread == parent)
                      return 0;
      
      Will clearly fail.
      
      The problem is that perf_counter_fork() reports the actual
      parent, instead of the cloning thread.
      
      Fixing that (with the below patch), yields:
      
       # ./perf report -D | grep FORK
      0x4c8 [0x18]: PERF_EVENT_FORK: (1590:1590):(1589:1589)
      0xbd8 [0x18]: PERF_EVENT_FORK: (1590:1591):(1590:1590)
      0xc80 [0x18]: PERF_EVENT_FORK: (1590:1592):(1590:1590)
      0x3338 [0x18]: PERF_EVENT_FORK: (1590:1593):(1590:1590)
      0x66b0 [0x18]: PERF_EVENT_FORK: (1590:1594):(1590:1590)
      
      Which both makes more sense and doesn't confuse perf report
      anymore.
      Reported-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: paulus@samba.org
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <1250172882.5241.62.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94d5d1b2