1. 19 4月, 2010 1 次提交
  2. 15 4月, 2010 3 次提交
    • F
      perf: Fix hlist related build error · 95476b64
      Frederic Weisbecker 提交于
      hlist helpers need to be available for all software events, not
      only trace events.
      
      Pull them out outside the ifdef CONFIG_EVENT_TRACING section.
      
      Fixes:
      	kernel/perf_event.c:4573: error: implicit declaration of function 'swevent_hlist_put'
      	kernel/perf_event.c:4614: error: implicit declaration of function 'swevent_hlist_get'
      	kernel/perf_event.c:5534: error: implicit declaration of function 'swevent_hlist_release
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1271281338-23491-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      95476b64
    • F
      perf: Make clock software events consistent with general exclusion rules · df8290bf
      Frederic Weisbecker 提交于
      The cpu/task clock events implement their own version of exclusion
      on top of exclude_user and exclude_kernel.
      
      The result is that when the event triggered in the kernel but we
      have exclude_kernel set, we try to rewind using task_pt_regs.
      There are two side effects of this:
      
      - we call task_pt_regs even on kernel threads, which doesn't give
        us the desired result.
      - if the event occured in the kernel, we shouldn't rewind to the
        user context. We want to actually ignore the event.
      
      get_irq_regs() will always give us the right interrupted context, so
      use its result and submit it to perf_exclude_context() that knows
      when an event must be ignored.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      df8290bf
    • F
      perf: Store active software events in a hashlist · 76e1d904
      Frederic Weisbecker 提交于
      Each time a software event triggers, we need to walk through
      the entire list of events from the current cpu and task contexts
      to retrieve a running perf event that matches.
      We also need to check a matching perf event is actually counting.
      
      This walk is wasteful and makes the event fast path scaling
      down with a growing number of events running on the same
      contexts.
      
      To solve this, we store the running perf events in a hashlist to
      get an immediate access to them against their type:event_id when
      they trigger.
      
      v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic
            maths along the way)
          - Only allocate hlist for online cpus, but keep track of the
            refcount on offline possible cpus too, so that we allocate it
            if needed when it becomes online.
          - Drop the kref use as it's not adapted to our tricks anymore.
      
      v3: - Fix bad refcount check (address instead of value). Thanks to
            Eric Dumazet who spotted this.
          - While exiting cpu, move the hlist release out of the IPI path
            to lock the hlist mutex sanely.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      76e1d904
  3. 04 4月, 2010 1 次提交
  4. 03 4月, 2010 2 次提交
  5. 01 4月, 2010 1 次提交
    • F
      perf: Use hot regs with software sched switch/migrate events · e49a5bd3
      Frederic Weisbecker 提交于
      Scheduler's task migration events don't work because they always
      pass NULL regs perf_sw_event(). The event hence gets filtered
      in perf_swevent_add().
      
      Scheduler's context switches events use task_pt_regs() to get
      the context when the event occured which is a wrong thing to
      do as this won't give us the place in the kernel where we went
      to sleep but the place where we left userspace. The result is
      even more wrong if we switch from a kernel thread.
      
      Use the hot regs snapshot for both events as they belong to the
      non-interrupt/exception based events family. Unlike page faults
      or so that provide the regs matching the exact origin of the event,
      we need to save the current context.
      
      This makes the task migration event working and fix the context
      switch callchains and origin ip.
      
      Example: perf record -a -e cs
      
      Before:
      
          10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                      |
                      --- (nil)
                          perf_callchain
                          perf_prepare_sample
                          __perf_event_overflow
                          perf_swevent_overflow
                          perf_swevent_add
                          perf_swevent_ctx_event
                          do_perf_sw_event
                          __perf_sw_event
                          perf_event_task_sched_out
                          schedule
                          run_ksoftirqd
                          kthread
                          kernel_thread_helper
      
      After:
      
          23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
                  |
                  --- schedule
                     |
                     |--60.00%-- schedule_timeout
                     |          wait_for_common
                     |          wait_for_completion
                     |          blk_execute_rq
                     |          scsi_execute
                     |          scsi_execute_req
                     |          sr_test_unit_ready
                     |          |
                     |          |--66.67%-- sr_media_change
                     |          |          media_changed
                     |          |          cdrom_media_changed
                     |          |          sr_block_media_changed
                     |          |          check_disk_change
                     |          |          cdrom_open
      
      v2: Always build perf_arch_fetch_caller_regs() now that software
      events need that too. They don't need it from modules, unlike trace
      events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      e49a5bd3
  6. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  7. 17 3月, 2010 2 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · dcd5c166
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd5c166
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · a6b84574
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6b84574
  8. 16 3月, 2010 1 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · 1d199b1a
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1d199b1a
  9. 11 3月, 2010 2 次提交
    • E
      perf_events: Improve task_sched_in() · 9b33fa6b
      eranian@google.com 提交于
      This patch is an optimization in perf_event_task_sched_in() to avoid
      scheduling the events twice in a row.
      
      Without it, the perf_disable()/perf_enable() pair is invoked twice,
      thereby pinned events counts while scheduling flexible events and we go
      throuh hw_perf_enable() twice.
      
      By encapsulating, the whole sequence into perf_disable()/perf_enable() we
      ensure, hw_perf_enable() is going to be invoked only once because of the
      refcount protection.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1268288765-5326-1-git-send-email-eranian@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b33fa6b
    • P
      perf_event: Fix oops triggered by cpu offline/online · 220b140b
      Paul Mackerras 提交于
      Anton Blanchard found that he could reliably make the kernel hit a
      BUG_ON in the slab allocator by taking a cpu offline and then online
      while a system-wide perf record session was running.
      
      The reason is that when the cpu comes up, we completely reinitialize
      the ctx field of the struct perf_cpu_context for the cpu.  If there is
      a system-wide perf record session running, then there will be a struct
      perf_event that has a reference to the context, so its refcount will
      be 2.  (The perf_event has been removed from the context's group_entry
      and event_entry lists by perf_event_exit_cpu(), but that doesn't
      remove the perf_event's reference to the context and doesn't decrement
      the context's refcount.)
      
      When the cpu comes up, perf_event_init_cpu() gets called, and it calls
      __perf_event_init_context() on the cpu's context.  That resets the
      refcount to 1.  Then when the perf record session finishes and the
      perf_event is closed, the refcount gets decremented to 0 and the
      context gets kfreed after an RCU grace period.  Since the context
      wasn't kmalloced -- it's part of a per-cpu variable -- bad things
      happen.
      
      In fact we don't need to completely reinitialize the context when the
      cpu comes up.  It's sufficient to initialize the context once at boot,
      but we need to do it for all possible cpus.
      
      This moves the context initialization to happen at boot time.  With
      this, we don't trash the refcount and the context never gets kfreed,
      and we don't hit the BUG_ON.
      Reported-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Tested-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      220b140b
  10. 10 3月, 2010 7 次提交
    • F
      perf: Drop the obsolete profile naming for trace events · 97d5a220
      Frederic Weisbecker 提交于
      Drop the obsolete "profile" naming used by perf for trace events.
      Perf can now do more than simple events counting, so generalize
      the API naming.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      97d5a220
    • F
      perf: Take a hot regs snapshot for trace events · c530665c
      Frederic Weisbecker 提交于
      We are taking a wrong regs snapshot when a trace event triggers.
      Either we use get_irq_regs(), which gives us the interrupted
      registers if we are in an interrupt, or we use task_pt_regs()
      which gives us the state before we entered the kernel, assuming
      we are lucky enough to be no kernel thread, in which case
      task_pt_regs() returns the initial set of regs when the kernel
      thread was started.
      
      What we want is different. We need a hot snapshot of the regs,
      so that we can get the instruction pointer to record in the
      sample, the frame pointer for the callchain, and some other
      things.
      
      Let's use the new perf_fetch_caller_regs() for that.
      
      Comparison with perf record -e lock: -R -a -f -g
      Before:
      
              perf  [kernel]                   [k] __do_softirq
                     |
                     --- __do_softirq
                        |
                        |--55.16%-- __open
                        |
                         --44.84%-- __write_nocancel
      
      After:
      
                  perf  [kernel]           [k] perf_tp_event
                     |
                     --- perf_tp_event
                        |
                        |--41.07%-- lock_acquire
                        |          |
                        |          |--39.36%-- _raw_spin_lock
                        |          |          |
                        |          |          |--7.81%-- hrtimer_interrupt
                        |          |          |          smp_apic_timer_interrupt
                        |          |          |          apic_timer_interrupt
      
      The old case was producing unreliable callchains. Now having
      right frame and instruction pointers, we have the trace we
      want.
      
      Also syscalls and kprobe events already have the right regs,
      let's use them instead of wasting a retrieval.
      
      v2: Follow the rename perf_save_regs() -> perf_fetch_caller_regs()
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      c530665c
    • F
      perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot · 5331d7b8
      Frederic Weisbecker 提交于
      Events that trigger overflows by interrupting a context can
      use get_irq_regs() or task_pt_regs() to retrieve the state
      when the event triggered. But this is not the case for some
      other class of events like trace events as tracepoints are
      executed in the same context than the code that triggered
      the event.
      
      It means we need a different api to capture the regs there,
      namely we need a hot snapshot to get the most important
      informations for perf: the instruction pointer to get the
      event origin, the frame pointer for the callchain, the code
      segment for user_mode() tests (we always use __KERNEL_CS as
      trace events always occur from the kernel) and the eflags
      for further purposes.
      
      v2: rename perf_save_regs to perf_fetch_caller_regs as per
      Masami's suggestion.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      5331d7b8
    • P
      perf: Provide better condition for event rotation · d4944a06
      Peter Zijlstra 提交于
      Try to avoid useless rotation and PMU disables.
      
      [ Could be improved by keeping a nr_runnable count to better account
        for the < PERF_STAT_INACTIVE counters ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d4944a06
    • P
      perf: Optimize perf_disable · 32975a4f
      Peter Zijlstra 提交于
      Currently we always call hw_perf_disable(), even if its already disabled,
      this seems superflous, esp. since it cannot be made NMI safe (see further
      patches).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      32975a4f
    • P
      perf: Rework and fix the arch CPU-hotplug hooks · 3f6da390
      Peter Zijlstra 提交于
      Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug
      notifiers. This has the advantage of reducing the static weak interface
      as well as exposing all hotplug actions to the PMU.
      
      Use this to fix x86 hotplug usage where we did things in ONLINE which
      should have been done in UP_PREPARE or STARTING.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100305154128.736225361@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f6da390
    • P
      perf: Provide generic perf_sample_data initialization · dc1d628a
      Peter Zijlstra 提交于
      This makes it easier to extend perf_sample_data and fixes a bug on arm
      and sparc, which failed to set ->raw to NULL, which can cause crashes
      when combined with PERF_SAMPLE_RAW.
      
      It also optimizes PowerPC and tracepoint, because the struct
      initialization is forced to zero out the whole structure.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJean Pihet <jpihet@mvista.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Jamie Iles <jamie.iles@picochip.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@kernel.org
      LKML-Reference: <20100304140100.315416040@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc1d628a
  11. 08 3月, 2010 1 次提交
    • A
      sysdev: Pass attribute in sysdev_class attributes show/store · c9be0a36
      Andi Kleen 提交于
      Passing the attribute to the low level IO functions allows all kinds
      of cleanups, by sharing low level IO code without requiring
      an own function for every piece of data.
      
      Also drivers can extend the attributes with own data fields
      and use that in the low level function.
      
      Similar to sysdev_attributes and normal attributes.
      
      This is a tree-wide sweep, converting everything in one go.
      
      No functional changes in this patch other than passing the new
      argument everywhere.
      
      Tested on x86, the non x86 parts are uncompiled.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      
      c9be0a36
  12. 07 3月, 2010 1 次提交
  13. 02 3月, 2010 1 次提交
  14. 27 2月, 2010 1 次提交
  15. 26 2月, 2010 4 次提交
    • P
      perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in() · 6e37738a
      Peter Zijlstra 提交于
      Since the cpu argument to hw_perf_group_sched_in() is always
      smp_processor_id(), simplify the code a little by removing this argument
      and using the current cpu where needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1265890918.5396.3.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e37738a
    • S
      perf_events, x86: AMD event scheduling · 38331f62
      Stephane Eranian 提交于
      This patch adds correct AMD NorthBridge event scheduling.
      
      NB events are events measuring L3 cache, Hypertransport traffic. They are
      identified by an event code >= 0xe0. They measure events on the
      Northbride which is shared by all cores on a package. NB events are
      counted on a shared set of counters. When a NB event is programmed in a
      counter, the data actually comes from a shared counter. Thus, access to
      those counters needs to be synchronized.
      
      We implement the synchronization such that no two cores can be measuring
      NB events using the same counters. Thus, we maintain a per-NB allocation
      table. The available slot is propagated using the event_constraint
      structure.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703957.0702d00a.6bf2.7b7d@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      38331f62
    • S
      perf_events: Add new start/stop PMU callbacks · d76a0812
      Stephane Eranian 提交于
      In certain situations, the kernel may need to stop and start the same
      event rapidly. The current PMU callbacks do not distinguish between stop
      and release (i.e., stop + free the resource). Thus, a counter may be
      released, then it will be immediately re-acquired. Event scheduling will
      again take place with no guarantee to assign the same counter. On some
      processors, this may event yield to failure to assign the event back due
      to competion between cores.
      
      This patch is adding a new pair of callback to stop and restart a counter
      without actually release the underlying counter resource. On stop, the
      counter is stopped, its values saved and that's it. On start, the value
      is reloaded and counter is restarted (on x86, actual restart is delayed
      until perf_enable()).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added fallback to ->enable/->disable for all other PMUs
        fixed x86_pmu_start() to call x86_pmu.enable()
        merged __x86_pmu_disable into x86_pmu_stop() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d76a0812
    • P
      perf_events: Report the MMAP pgoff value in bytes · 3a0304e9
      Peter Zijlstra 提交于
      DaveM reported that currently perf interprets the pgoff value reported by
      the MMAP events as a byte range, but the kernel reports it as a page
      offset.
      
      Since its broken (and unusable) anyway, change the kernel behaviour (ABI)
      to report bytes indeed, avoiding the need for userspace to deal with
      PAGE_SIZE things.
      Reported-by: NDavid Miller <davem@davemloft.net>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a0304e9
  16. 15 2月, 2010 1 次提交
    • P
      perf_events: Fix FORK events · 6f93d0a7
      Peter Zijlstra 提交于
      Commit 22e19085 ("Honour event state for aux stream data")
      introduced a bug where we would drop FORK events.
      
      The thing is that we deliver FORK events to the child process'
      event, which at that time will be PERF_EVENT_STATE_INACTIVE
      because the child won't be scheduled in (we're in the middle of
      fork).
      
      Solve this twice, change the event state filter to exclude only
      disabled (STATE_OFF) or worse, and deliver FORK events to the
      current (parent).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      LKML-Reference: <1266142324.5273.411.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6f93d0a7
  17. 04 2月, 2010 2 次提交
  18. 29 1月, 2010 1 次提交
    • P
      perf_events: Fix sample_period transfer on inherit · 75c9f328
      Peter Zijlstra 提交于
      One problem with frequency driven counters is that we cannot
      predict the rate at which they trigger, therefore we have to
      start them at period=1, this causes a ramp up effect. However,
      if we fail to propagate the stable state on fork each new child
      will have to ramp up again. This can lead to significant
      artifacts in sample data.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: eranian@google.com
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1264752266.4283.2121.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      75c9f328
  19. 27 1月, 2010 1 次提交
    • P
      perf: Reimplement frequency driven sampling · abd50713
      Peter Zijlstra 提交于
      There was a bug in the old period code that caused intel_pmu_enable_all()
      or native_write_msr_safe() to show up quite high in the profiles.
      
      In staring at that code it made my head hurt, so I rewrote it in a
      hopefully simpler fashion. Its now fully symetric between tick and
      overflow driven adjustments and uses less data to boot.
      
      The only complication is that it basically wants to do a u128 division.
      The code approximates that in a rather simple truncate until it fits
      fashion, taking care to balance the terms while truncating.
      
      This version does not generate that sampling artefact.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      abd50713
  20. 21 1月, 2010 1 次提交
  21. 17 1月, 2010 4 次提交
    • F
      perf: Better order flexible and pinned scheduling · 329c0e01
      Frederic Weisbecker 提交于
      When a task gets scheduled in. We don't touch the cpu bound events
      so the priority order becomes:
      
      	cpu pinned, cpu flexible, task pinned, task flexible.
      
      So schedule out cpu flexibles when a new task context gets in
      and correctly order the groups to schedule in:
      
      	task pinned, cpu flexible, task flexible.
      
      Cpu pinned groups don't need to be touched at this time.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      329c0e01
    • F
      perf: Don't schedule out/in pinned events on task tick · 7defb0f8
      Frederic Weisbecker 提交于
      We don't need to schedule in/out pinned events on task tick,
      now that pinned and flexible groups can be scheduled separately.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      7defb0f8
    • F
      perf: Allow pinned and flexible groups to be scheduled separately · 5b0311e1
      Frederic Weisbecker 提交于
      Tune the scheduling helpers so that we can choose to schedule either
      pinned and/or flexible groups from a context.
      
      And while at it, refactor a bit the naming of these helpers to make
      these more consistent and flexible.
      
      There is no (intended) change in scheduling behaviour in this
      patch.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      5b0311e1
    • F
      perf: Make __perf_event_sched_out static · 42cce92f
      Frederic Weisbecker 提交于
      __perf_event_sched_out doesn't need to be globally available, make
      it static.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      42cce92f
  22. 16 1月, 2010 1 次提交
    • F
      perf: Export software-only event group characteristic as a flag · d6f962b5
      Frederic Weisbecker 提交于
      Before scheduling an event group, we first check if a group can go
      on. We first check if the group is made of software only events
      first, in which case it is enough to know if the group can be
      scheduled in.
      
      For that purpose, we iterate through the whole group, which is
      wasteful as we could do this check when we add/delete an event to
      a group.
      
      So we create a group_flags field in perf event that can host
      characteristics from a group of events, starting with a first
      PERF_GROUP_SOFTWARE flag that reduces the check on the fast path.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      d6f962b5