1. 09 6月, 2010 9 次提交
  2. 31 5月, 2010 2 次提交
  3. 21 5月, 2010 5 次提交
  4. 19 5月, 2010 4 次提交
  5. 11 5月, 2010 1 次提交
  6. 07 5月, 2010 3 次提交
    • L
      perf: Add group scheduling transactional APIs · 6bde9b6c
      Lin Ming 提交于
      Add group scheduling transactional APIs to struct pmu.
      These APIs will be implemented in arch code, based on Peter's idea as
      below.
      
      > the idea behind hw_perf_group_sched_in() is to not perform
      > schedulability tests on each event in the group, but to add the group
      > as a whole and then perform one test.
      >
      > Of course, when that test fails, you'll have to roll-back the whole
      > group again.
      >
      > So start_txn (or a better name) would simply toggle a flag in the pmu
      > implementation that will make pmu::enable() not perform the
      > schedulablilty test.
      >
      > Then commit_txn() will perform the schedulability test (so note the
      > method has to have a !void return value.
      >
      > This will allow us to use the regular
      > kernel/perf_event.c::group_sched_in() and all the rollback code.
      > Currently each hw_perf_group_sched_in() implementation duplicates all
      > the rolllback code (with various bugs).
      
      ->start_txn:
      Start group events scheduling transaction, set a flag to make
      pmu::enable() not perform the schedulability test, it will be performed
      at commit time.
      
      ->commit_txn:
      Commit group events scheduling transaction, perform the group
      schedulability as a whole
      
      ->cancel_txn:
      Stop group events scheduling transaction, clear the flag so
      pmu::enable() will perform the schedulability test.
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLin Ming <ming.m.lin@intel.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1272002160.5707.60.camel@minggr.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6bde9b6c
    • P
      perf, x86: Improve the PEBS ABI · ab608344
      Peter Zijlstra 提交于
      Rename perf_event_attr::precise to perf_event_attr::precise_ip and
      widen it to 2 bits. This new field describes the required precision of
      the PERF_SAMPLE_IP field:
      
        0 - SAMPLE_IP can have arbitrary skid
        1 - SAMPLE_IP must have constant skid
        2 - SAMPLE_IP requested to have 0 skid
        3 - SAMPLE_IP must have 0 skid
      
      And modify the Intel PEBS code accordingly. The PEBS implementation
      now supports up to precise_ip == 2, where we perform the IP fixup.
      
      Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit
      should be set for each PERF_SAMPLE_IP field known to match the actual
      instruction triggering the event.
      
      This new scheme allows for a PEBS mode that uses the buffer for more
      than a single event.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephane Eranian <eranian@google.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ab608344
    • P
      perf: Fix exit() vs PERF_FORMAT_GROUP · 4fd38e45
      Peter Zijlstra 提交于
      Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work
      as expected if the task the counters were attached to quit before
      the read() call.
      
      The cause is that we unconditionally destroy the grouping when we
      remove counters from their context. Fix this by only doing this when
      we free the counter itself.
      Reported-by: NCorey Ashford <cjashfor@linux.vnet.ibm.com>
      Reported-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1273160566.5605.404.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4fd38e45
  7. 20 4月, 2010 1 次提交
    • Z
      perf & kvm: Clean up some of the guest profiling callback API details · dcf46b94
      Zhang, Yanmin 提交于
      Fix some build bug and programming style issues:
      
       - use valid C
       - fix up various style details
      Signed-off-by: NZhang Yanmin <yanmin_zhang@linux.intel.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Sheng Yang <sheng@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: oerg Roedel <joro@8bytes.org>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Zachary Amsden <zamsden@redhat.com>
      Cc: zhiteng.huang@intel.com
      Cc: tim.c.chen@intel.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <1271729638.2078.624.camel@ymzhang.sh.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcf46b94
  8. 19 4月, 2010 1 次提交
  9. 15 4月, 2010 1 次提交
    • F
      perf: Store active software events in a hashlist · 76e1d904
      Frederic Weisbecker 提交于
      Each time a software event triggers, we need to walk through
      the entire list of events from the current cpu and task contexts
      to retrieve a running perf event that matches.
      We also need to check a matching perf event is actually counting.
      
      This walk is wasteful and makes the event fast path scaling
      down with a growing number of events running on the same
      contexts.
      
      To solve this, we store the running perf events in a hashlist to
      get an immediate access to them against their type:event_id when
      they trigger.
      
      v2: - Fix SWEVENT_HLIST_SIZE definition (and re-learn some basic
            maths along the way)
          - Only allocate hlist for online cpus, but keep track of the
            refcount on offline possible cpus too, so that we allocate it
            if needed when it becomes online.
          - Drop the kref use as it's not adapted to our tricks anymore.
      
      v3: - Fix bad refcount check (address instead of value). Thanks to
            Eric Dumazet who spotted this.
          - While exiting cpu, move the hlist release out of the IPI path
            to lock the hlist mutex sanely.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      76e1d904
  10. 01 4月, 2010 1 次提交
    • F
      perf: Use hot regs with software sched switch/migrate events · e49a5bd3
      Frederic Weisbecker 提交于
      Scheduler's task migration events don't work because they always
      pass NULL regs perf_sw_event(). The event hence gets filtered
      in perf_swevent_add().
      
      Scheduler's context switches events use task_pt_regs() to get
      the context when the event occured which is a wrong thing to
      do as this won't give us the place in the kernel where we went
      to sleep but the place where we left userspace. The result is
      even more wrong if we switch from a kernel thread.
      
      Use the hot regs snapshot for both events as they belong to the
      non-interrupt/exception based events family. Unlike page faults
      or so that provide the regs matching the exact origin of the event,
      we need to save the current context.
      
      This makes the task migration event working and fix the context
      switch callchains and origin ip.
      
      Example: perf record -a -e cs
      
      Before:
      
          10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                      |
                      --- (nil)
                          perf_callchain
                          perf_prepare_sample
                          __perf_event_overflow
                          perf_swevent_overflow
                          perf_swevent_add
                          perf_swevent_ctx_event
                          do_perf_sw_event
                          __perf_sw_event
                          perf_event_task_sched_out
                          schedule
                          run_ksoftirqd
                          kthread
                          kernel_thread_helper
      
      After:
      
          23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
                  |
                  --- schedule
                     |
                     |--60.00%-- schedule_timeout
                     |          wait_for_common
                     |          wait_for_completion
                     |          blk_execute_rq
                     |          scsi_execute
                     |          scsi_execute_req
                     |          sr_test_unit_ready
                     |          |
                     |          |--66.67%-- sr_media_change
                     |          |          media_changed
                     |          |          cdrom_media_changed
                     |          |          sr_block_media_changed
                     |          |          check_disk_change
                     |          |          cdrom_open
      
      v2: Always build perf_arch_fetch_caller_regs() now that software
      events need that too. They don't need it from modules, unlike trace
      events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      e49a5bd3
  11. 11 3月, 2010 1 次提交
    • P
      perf, ppc: Fix compile error due to new cpu notifiers · 85cfabbc
      Peter Zijlstra 提交于
      Fix:
      
        arch/powerpc/kernel/perf_event.c:1334: error: 'power_pmu_notifier' undeclared (first use in this function)
        arch/powerpc/kernel/perf_event.c:1334: error: (Each undeclared identifier is reported only once
        arch/powerpc/kernel/perf_event.c:1334: error: for each function it appears in.)
        arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'power_pmu_notifier'
        arch/powerpc/kernel/perf_event.c:1334: error: implicit declaration of function 'register_cpu_notifier'
      
      Due to commit 3f6da390 (perf: Rework and fix the arch CPU-hotplug hooks).
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      85cfabbc
  12. 10 3月, 2010 6 次提交
    • F
      perf: Introduce new perf_fetch_caller_regs() for hot regs snapshot · 5331d7b8
      Frederic Weisbecker 提交于
      Events that trigger overflows by interrupting a context can
      use get_irq_regs() or task_pt_regs() to retrieve the state
      when the event triggered. But this is not the case for some
      other class of events like trace events as tracepoints are
      executed in the same context than the code that triggered
      the event.
      
      It means we need a different api to capture the regs there,
      namely we need a hot snapshot to get the most important
      informations for perf: the instruction pointer to get the
      event origin, the frame pointer for the callchain, the code
      segment for user_mode() tests (we always use __KERNEL_CS as
      trace events always occur from the kernel) and the eflags
      for further purposes.
      
      v2: rename perf_save_regs to perf_fetch_caller_regs as per
      Masami's suggestion.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Archs <linux-arch@vger.kernel.org>
      5331d7b8
    • P
      perf, x86: use LBR for PEBS IP+1 fixup · ef21f683
      Peter Zijlstra 提交于
      Use the LBR to fix up the PEBS IP+1 issue.
      
      As said, PEBS reports the next instruction, here we use the LBR to find
      the last branch and from that construct the actual IP. If the IP matches
      the LBR-TO, we use LBR-FROM, otherwise we use the LBR-TO address as the
      beginning of the last basic block and decode forward.
      
      Once we find a match to the current IP, we use the previous location.
      
      This patch introduces a new ABI element: PERF_RECORD_MISC_EXACT, which
      conveys that the reported IP (PERF_SAMPLE_IP) is the exact instruction
      that caused the event (barring CPU errata).
      
      The fixup can fail due to various reasons:
      
       1) LBR contains invalid data (quite possible)
       2) part of the basic block got paged out
       3) the reported IP isn't part of the basic block (see 1)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <20100304140100.619375431@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ef21f683
    • P
      perf, x86: Implement simple LBR support · caff2bef
      Peter Zijlstra 提交于
      Implement simple suport Intel Last-Branch-Record, it supports all
      hardware that implements FREEZE_LBRS_ON_PMI, but does not (yet) implement
      the LBR config register.
      
      The Intel LBR is a FIFO of From,To addresses describing the last few
      branches the hardware took.
      
      This patch does not add perf interface to the LBR, but merely provides an
      interface for internal use.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <20100304140100.544191154@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      caff2bef
    • P
      perf, x86: Add PEBS infrastructure · ca037701
      Peter Zijlstra 提交于
      This patch implements support for Intel Precise Event Based Sampling,
      which is an alternative counter mode in which the counter triggers a
      hardware assist to collect information on events. The hardware assist
      takes a trap like snapshot of a subset of the machine registers.
      
      This data is written to the Intel Debug-Store, which can be programmed
      with a data threshold at which to raise a PMI.
      
      With the PEBS hardware assist being trap like, the reported IP is always
      one instruction after the actual instruction that triggered the event.
      
      This implements a simple PEBS model that always takes a single PEBS event
      at a time. This is done so that the interaction with the rest of the
      system is as expected (freq adjust, period randomization, lbr,
      callchains, etc.).
      
      It adds an ABI element: perf_event_attr::precise, which indicates that we
      wish to use this (constrained, but precise) mode.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      LKML-Reference: <20100304140100.392111285@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ca037701
    • P
      perf: Rework and fix the arch CPU-hotplug hooks · 3f6da390
      Peter Zijlstra 提交于
      Remove the hw_perf_event_*() hotplug hooks in favour of per PMU hotplug
      notifiers. This has the advantage of reducing the static weak interface
      as well as exposing all hotplug actions to the PMU.
      
      Use this to fix x86 hotplug usage where we did things in ONLINE which
      should have been done in UP_PREPARE or STARTING.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: paulus@samba.org
      Cc: eranian@google.com
      Cc: robert.richter@amd.com
      Cc: fweisbec@gmail.com
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      LKML-Reference: <20100305154128.736225361@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3f6da390
    • P
      perf: Provide generic perf_sample_data initialization · dc1d628a
      Peter Zijlstra 提交于
      This makes it easier to extend perf_sample_data and fixes a bug on arm
      and sparc, which failed to set ->raw to NULL, which can cause crashes
      when combined with PERF_SAMPLE_RAW.
      
      It also optimizes PowerPC and tracepoint, because the struct
      initialization is forced to zero out the whole structure.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJean Pihet <jpihet@mvista.com>
      Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Jamie Iles <jamie.iles@picochip.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: stable@kernel.org
      LKML-Reference: <20100304140100.315416040@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dc1d628a
  13. 02 3月, 2010 1 次提交
  14. 28 2月, 2010 1 次提交
  15. 26 2月, 2010 2 次提交
    • P
      perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in() · 6e37738a
      Peter Zijlstra 提交于
      Since the cpu argument to hw_perf_group_sched_in() is always
      smp_processor_id(), simplify the code a little by removing this argument
      and using the current cpu where needed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1265890918.5396.3.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e37738a
    • S
      perf_events: Add new start/stop PMU callbacks · d76a0812
      Stephane Eranian 提交于
      In certain situations, the kernel may need to stop and start the same
      event rapidly. The current PMU callbacks do not distinguish between stop
      and release (i.e., stop + free the resource). Thus, a counter may be
      released, then it will be immediately re-acquired. Event scheduling will
      again take place with no guarantee to assign the same counter. On some
      processors, this may event yield to failure to assign the event back due
      to competion between cores.
      
      This patch is adding a new pair of callback to stop and restart a counter
      without actually release the underlying counter resource. On stop, the
      counter is stopped, its values saved and that's it. On start, the value
      is reloaded and counter is restarted (on x86, actual restart is delayed
      until perf_enable()).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      [ added fallback to ->enable/->disable for all other PMUs
        fixed x86_pmu_start() to call x86_pmu.enable()
        merged __x86_pmu_disable into x86_pmu_stop() ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <4b703875.0a04d00a.7896.ffffb824@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d76a0812
  16. 04 2月, 2010 1 次提交
    • S
      perf_events, x86: Fix bug in hw_perf_enable() · 447a194b
      Stephane Eranian 提交于
      We cannot assume that because hwc->idx == assign[i], we can avoid
      reprogramming the counter in hw_perf_enable().
      
      The event may have been scheduled out and another event may have been
      programmed into this counter. Thus, we need a more robust way of
      verifying if the counter still contains config/data related to an event.
      
      This patch adds a generation number to each counter on each cpu. Using
      this mechanism we can verify reliabilty whether the content of a counter
      corresponds to an event.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <4b66dc67.0b38560a.1635.ffffae18@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      447a194b