1. 21 11月, 2009 12 次提交
  2. 16 11月, 2009 1 次提交
    • P
      perf_event: Optimize perf_output_lock() · 559fdc3c
      Peter Zijlstra 提交于
      The purpose of perf_output_{un,}lock() is to:
      
       1) avoid publishing incomplete data
          [ possible when publishing a head that is ahead of an entry
            that is still being written ]
      
       2) guarantee fwd progress
          [ a simple refcount on pending writers doesn't need to drop to
            0, making it so would end up implementing something like forced
            quiecent states of RCU ]
      
      To satisfy the above without undue complexity it serializes
      between CPUs, this means that a pending writer can only be the
      same cpu in a nested context, and since (under normal operation)
      a cpu always makes progress we're good -- if the head is only
      published when the bottom  most writer completes.
      
      Now we don't need to disable IRQs in order to serialize between
      CPUs, disabling preemption ought to be sufficient, esp since we
      already deal with nesting due to NMIs.
      
      This avoids potentially expensive (and needless) local IRQ
      disable/enable ops.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1258373161.26714.254.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      559fdc3c
  3. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  4. 04 11月, 2009 2 次提交
    • F
      perf/core: Add a callback to perf events · 97eaf530
      Frederic Weisbecker 提交于
      A simple callback in a perf event can be used for multiple purposes.
      For example it is useful for triggered based events like hardware
      breakpoints that need a callback to dispatch a triggered breakpoint
      event.
      
      v2: Simplify a bit the callback attribution as suggested by Paul
          Mackerras
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mundt <lethal@linux-sh.org>
      97eaf530
    • A
      perf/core: Provide a kernel-internal interface to get to performance counters · fb0459d7
      Arjan van de Ven 提交于
      There are reasons for kernel code to ask for, and use, performance
      counters.
      For example, in CPU freq governors this tends to be a good idea, but
      there are other examples possible as well of course.
      
      This patch adds the needed bits to do enable this functionality; they
      have been tested in an experimental cpufreq driver that I'm working on,
      and the changes are all that I needed to access counters properly.
      
      [fweisbec@gmail.com: added pid to perf_event_create_kernel_counter so
      that we can profile a particular task too
      
      TODO: Have a better error reporting, don't just return NULL in fail
      case.]
      
      v2: Remove the wrong comment about the fact
          perf_event_create_kernel_counter must be called from a kernel
          thread.
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: "K.Prasad" <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Avi Kivity <avi@redhat.com>
      LKML-Reference: <20090925122556.2f8bd939@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      fb0459d7
  5. 28 10月, 2009 1 次提交
  6. 23 10月, 2009 2 次提交
    • S
      perf events: Don't generate events for the idle task when exclude_idle is set · 54f44076
      Soeren Sandmann 提交于
      Getting samples for the idle task is often not interesting, so
      don't generate them when exclude_idle is set for the event in
      question.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <ye8pr8fmlq7.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      54f44076
    • S
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time... · 721a669b
      Soeren Sandmann 提交于
      perf events: Fix swevent hrtimer sampling by keeping track of remaining time when enabling/disabling swevent hrtimers
      
      Make the hrtimer based events work for sysprof.
      
      Whenever a swevent is scheduled out, the hrtimer is canceled.
      When it is scheduled back in, the timer is restarted. This
      happens every scheduler tick, which means the timer never
      expired because it was getting repeatedly restarted over and
      over with the same period.
      
      To fix that, save the remaining time when disabling; when
      reenabling, use that saved time as the period instead of the
      user-specified sampling period.
      
      Also, move the starting and stopping of the hrtimers to helper
      functions instead of duplicating the code.
      Signed-off-by: NSøren Sandmann Pedersen <sandmann@redhat.com>
      LKML-Reference: <ye8vdi7mluz.fsf@camel16.daimi.au.dk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      721a669b
  7. 15 10月, 2009 1 次提交
  8. 14 10月, 2009 1 次提交
    • P
      perf_event: Adjust frequency and unthrottle for non-group-leader events · 03541f8b
      Paul Mackerras 提交于
      The loop in perf_ctx_adjust_freq checks the frequency of sampling
      event counters, and adjusts the event interval and unthrottles the
      event if required, and resets the interrupt count for the event.
      However, at present it only looks at group leaders.
      
      This means that a sampling event that is not a group leader will
      eventually get throttled, once its interrupt count reaches
      sysctl_perf_event_sample_rate/HZ --- and that is guaranteed to
      happen, if the event is active for long enough, since the interrupt
      count never gets reset.  Once it is throttled it never gets
      unthrottled, so it basically just stops working at that point.
      
      This fixes it by making perf_ctx_adjust_freq use ctx->event_list
      rather than ctx->group_list.  The existing spin_lock/spin_unlock
      around the loop makes it unnecessary to put rcu_read_lock/
      rcu_read_unlock around the list_for_each_entry_rcu().
      Reported-by: NMark W. Krentel <krentel@cs.rice.edu>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <19157.26731.855609.165622@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      03541f8b
  9. 06 10月, 2009 1 次提交
    • P
      perf_event: Provide vmalloc() based mmap() backing · 906010b2
      Peter Zijlstra 提交于
      Some architectures such as Sparc, ARM and MIPS (basically
      everything with flush_dcache_page()) need to deal with dcache
      aliases by carefully placing pages in both kernel and user maps.
      
      These architectures typically have to use vmalloc_user() for this.
      
      However, on other architectures, vmalloc() is not needed and has
      the downsides of being more restricted and slower than regular
      allocations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1254830228.21044.272.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      906010b2
  10. 01 10月, 2009 2 次提交
  11. 28 9月, 2009 1 次提交
  12. 21 9月, 2009 4 次提交
    • I
      perf: Tidy up after the big rename · 57c0c15b
      Ingo Molnar 提交于
       - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
      
       - provide courtesy copy of old perf_counter.h, for user-space projects
      
       - small indentation fixups
      
       - fix up MAINTAINERS
      
       - fix small x86 printout fallout
      
       - fix up small PowerPC comment fallout (use 'counter' as in register)
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57c0c15b
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
    • I
      perf_counter: Rename 'event' to event_id/hw_event · dfc65094
      Ingo Molnar 提交于
      In preparation to the renames, to avoid a namespace clash.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dfc65094
    • I
      perf_counter: Rename list_entry -> group_entry, counter_list -> group_list · 65abc865
      Ingo Molnar 提交于
      This is in preparation of the big rename, but also makes sense
      in a standalone way: 'list_entry' is a bad name as we already
      have a list_entry() in list.h.
      
      Also, the 'counter list' is too vague, it doesnt tell us the
      purpose of that list.
      
      Clarify these names to show that it's all about the group
      hiearchy.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      65abc865
  13. 20 9月, 2009 1 次提交
    • I
      perf_counter: Fix perf_copy_attr() pointer arithmetic · cdf8073d
      Ian Schram 提交于
      There is still some weird code in per_copy_attr(). Which supposedly
      checks that all bytes trailing a struct are zero.
      
      It doesn't seem to get pointer arithmetic right. Since it
      increments an iterating pointer by sizeof(unsigned long) rather
      than 1.
      Signed-off-by: NIan Schram <ischram@telenet.be>
      [ v2: clean up the messy PTR_ALIGN logic as well. ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: <stable@kernel.org> # for v2.6.31.x
      LKML-Reference: <4AB3DEE2.3030600@telenet.be>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdf8073d
  14. 19 9月, 2009 4 次提交
  15. 18 9月, 2009 2 次提交
  16. 15 9月, 2009 1 次提交
  17. 13 9月, 2009 1 次提交
    • I
      perf_counter: Allow mmap if paranoid checks are turned off · 459ec28a
      Ingo Molnar 提交于
      Before:
      
        $ perf sched record -f sleep 1
        Error: failed to mmap with 1 (Operation not permitted)
      
      After:
      
        $ perf sched record -f sleep 1
        [ perf record: Captured and wrote 0.095 MB perf.data (~4161 samples) ]
      
      Note, this is only allowed if perfcounter_paranoid is set to
      the most permissive (non-default) value of -1.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      459ec28a
  18. 04 9月, 2009 1 次提交
    • I
      perf_counter: Fix output-sharing error path · dc86cabe
      Ingo Molnar 提交于
      We forget to release the fd in the PERF_FLAG_FD_OUTPUT
      error path.
      
      Reorganize the error flow here to be a clean fall-through
      logic.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc86cabe
  19. 03 9月, 2009 1 次提交