1. 08 6月, 2012 1 次提交
  2. 09 5月, 2012 1 次提交
  3. 26 4月, 2012 1 次提交
  4. 24 3月, 2012 1 次提交
  5. 23 3月, 2012 1 次提交
    • P
      perf: Fix mmap_page capabilities and docs · c7206205
      Peter Zijlstra 提交于
      Complete the syscall-less self-profiling feature and address
      all complaints, namely:
      
       - capabilities, so we can detect what is actually available at runtime
      
           Add a capabilities field to perf_event_mmap_page to indicate
           what is actually available for use.
      
       - on x86: RDPMC weirdness due to being 40/48 bits and not sign-extending
         properly.
      
       - ABI documentation as to how all this stuff works.
      
      Also improve the documentation for the new features.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vweaver1@eecs.utk.edu>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1332433596.2487.33.camel@twinsSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c7206205
  6. 17 3月, 2012 1 次提交
  7. 05 3月, 2012 3 次提交
    • S
      perf: Add callback to flush branch_stack on context switch · d010b332
      Stephane Eranian 提交于
      With branch stack sampling, it is possible to filter by priv levels.
      
      In system-wide mode, that means it is possible to capture only user
      level branches. The builtin SW LBR filter needs to disassemble code
      based on LBR captured addresses. For that, it needs to know the task
      the addresses are associated with. Because of context switches, the
      content of the branch stack buffer may contain addresses from
      different tasks.
      
      We need a callback on context switch to either flush the branch stack
      or save it. This patch adds a new callback in struct pmu which is called
      during context switches. The callback is called only when necessary.
      That is when a system-wide context has, at least, one event which
      uses PERF_SAMPLE_BRANCH_STACK. The callback is never called for
      per-thread context.
      
      In this version, the Intel x86 code simply flushes (resets) the LBR
      on context switches (fills it with zeroes). Those zeroed branches are
      then filtered out by the SW filter.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-11-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d010b332
    • S
      perf/x86: Sync branch stack sampling with precise_sampling · ff3fb511
      Stephane Eranian 提交于
      If precise sampling is enabled on Intel x86 then perf_event uses PEBS.
      To correct for the off-by-one error of PEBS, perf_event uses LBR when
      precise_sample > 1.
      
      On Intel x86 PERF_SAMPLE_BRANCH_STACK is implemented using LBR,
      therefore both features must be coordinated as they may not
      configure LBR the same way.
      
      For PEBS, LBR needs to capture all branches at the priv level of
      the associated event.
      
      This patch checks that the branch type and priv level of BRANCH_STACK
      is compatible with that of the PEBS LBR requirement, thereby allowing:
      
         $ perf record -b any,u -e instructions:upp ....
      
      But:
      
         $ perf record -b any_call,u -e instructions:upp
      
      Is not possible.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-5-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      ff3fb511
    • S
      perf/x86: Add Intel LBR sharing logic · b36817e8
      Stephane Eranian 提交于
      The Intel LBR on some recent processor is capable
      of filtering branches by type. The filter is configurable
      via the LBR_SELECT MSR register.
      
      There are limitation on how this register can be used.
      
      On Nehalem/Westmere, the LBR_SELECT is shared by the two HT threads
      when HT is on. It is private to each core when HT is off.
      
      On SandyBridge, the LBR_SELECT register is private to each thread
      when HT is on. It is private to each core when HT is off.
      
      The kernel must manage the sharing of LBR_SELECT. It allows
      multiple users on the same logical CPU to use LBR_SELECT as
      long as they program it with the same value. Across sibling
      CPUs (HT threads), the same restriction applies on NHM/WSM.
      
      This patch implements this sharing logic by leveraging the
      mechanism put in place for managing the offcore_response
      shared MSR.
      
      We modify __intel_shared_reg_get_constraints() to cause
      x86_get_event_constraint() to be called because LBR may
      be associated with events that may be counter constrained.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/1328826068-11713-4-git-send-email-eranian@google.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b36817e8
  8. 21 2月, 2012 1 次提交
  9. 07 2月, 2012 1 次提交
    • S
      perf: Fix double start/stop in x86_pmu_start() · f39d47ff
      Stephane Eranian 提交于
      The following patch fixes a bug introduced by the following
      commit:
      
              e050e3f0 ("perf: Fix broken interrupt rate throttling")
      
      The patch caused the following warning to pop up depending on
      the sampling frequency adjustments:
      
        ------------[ cut here ]------------
        WARNING: at arch/x86/kernel/cpu/perf_event.c:995 x86_pmu_start+0x79/0xd4()
      
      It was caused by the following call sequence:
      
      perf_adjust_freq_unthr_context.part() {
           stop()
           if (delta > 0) {
                perf_adjust_period() {
                    if (period > 8*...) {
                        stop()
                        ...
                        start()
                    }
                }
            }
            start()
      }
      
      Which caused a double start and a double stop, thus triggering
      the assert in x86_pmu_start().
      
      The patch fixes the problem by avoiding the double calls. We
      pass a new argument to perf_adjust_period() to indicate whether
      or not the event is already stopped. We can't just remove the
      start/stop from that function because it's called from
      __perf_event_overflow where the event needs to be reloaded via a
      stop/start back-toback call.
      
      The patch reintroduces the assertion in x86_pmu_start() which
      was removed by commit:
      
      	84f2b9b2 ("perf: Remove deprecated WARN_ON_ONCE()")
      
      In this second version, we've added calls to disable/enable PMU
      during unthrottling or frequency adjustment based on bug report
      of spurious NMI interrupts from Eric Dumazet.
      Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: markus@trippelsdorf.de
      Cc: paulus@samba.org
      Link: http://lkml.kernel.org/r/20120207133956.GA4932@quad
      [ Minor edits to the changelog and to the code ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f39d47ff
  10. 03 2月, 2012 1 次提交
    • S
      perf: Remove deprecated WARN_ON_ONCE() · 84f2b9b2
      Stephane Eranian 提交于
      With the new throttling/unthrottling code introduced with
      commit:
      
        e050e3f0 ("perf: Fix broken interrupt rate throttling")
      
      we occasionally hit two WARN_ON_ONCE() checks in:
      
        - intel_pmu_pebs_enable()
        - intel_pmu_lbr_enable()
        - x86_pmu_start()
      
      The assertions are no longer problematic. There is a valid
      path where they can trigger but it is harmless.
      
      The assertion can be triggered with:
      
        $ perf record -e instructions:pp ....
      
      Leading to paths:
      
        intel_pmu_pebs_enable
        intel_pmu_enable_event
        x86_perf_event_set_period
        x86_pmu_start
        perf_adjust_freq_unthr_context
        perf_event_task_tick
        scheduler_tick
      
      And:
      
        intel_pmu_lbr_enable
        intel_pmu_enable_event
        x86_perf_event_set_period
        x86_pmu_start
        perf_adjust_freq_unthr_context.
        perf_event_task_tick
        scheduler_tick
      
      cpuc->enabled is always on because when we get to
      perf_adjust_freq_unthr_context() the PMU is not totally
      disabled. Furthermore when we need to adjust a period,
      we only stop the event we need to change and not the
      entire PMU. Thus, when we re-enable, cpuc->enabled is
      already set. Note that when we stop the event, both
      pebs and lbr are stopped if necessary (and possible).
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: peterz@infradead.org
      Link: http://lkml.kernel.org/r/20120202110401.GA30911@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      84f2b9b2
  11. 21 12月, 2011 4 次提交
  12. 07 12月, 2011 2 次提交
  13. 06 12月, 2011 3 次提交
    • P
      perf, x86: Prefer fixed-purpose counters when scheduling · 4defea85
      Peter Zijlstra 提交于
      This avoids a scheduling failure for cases like:
      
        cycles, cycles, instructions, instructions (on Core2)
      
      Which would end up being programmed like:
      
        PMC0, PMC1, FP-instructions, fail
      
      Because all events will have the same weight.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-8tnwb92asqj7xajqqoty4gel@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4defea85
    • R
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter 提交于
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      bc1738f6
    • R
      perf, x86: Implement event scheduler helper functions · 1e2ad28f
      Robert Richter 提交于
      This patch introduces x86 perf scheduler code helper functions. We
      need this to later add more complex functionality to support
      overlapping counter constraints (next patch).
      
      The algorithm is modified so that the range of weight values is now
      generated from the constraints. There shouldn't be other functional
      changes.
      
      With the helper functions the scheduler is controlled. There are
      functions to initialize, traverse the event list, find unused counters
      etc. The scheduler keeps its own state.
      
      V3:
      * Added macro for_each_set_bit_cont().
      * Changed functions interfaces of perf_sched_find_counter() and
        perf_sched_next_event() to use bool as return value.
      * Added some comments to make code better understandable.
      
      V4:
      * Fix broken event assignment if weight of the first event is not
        wmin (perf_sched_init()).
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-2-git-send-email-robert.richter@amd.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      1e2ad28f
  14. 14 11月, 2011 2 次提交
  15. 10 10月, 2011 1 次提交
  16. 26 9月, 2011 1 次提交
  17. 31 8月, 2011 1 次提交
    • A
      x86, perf: Check that current->mm is alive before getting user callchain · 20afc60f
      Andrey Vagin 提交于
      An event may occur when an mm is already released.
      
      I added an event in dequeue_entity() and caught a panic with
      the following backtrace:
      
      [  434.421110] BUG: unable to handle kernel NULL pointer dereference at 0000000000000050
      [  434.421258] IP: [<ffffffff810464ac>] __get_user_pages_fast+0x9c/0x120
      ...
      [  434.421258] Call Trace:
      [  434.421258]  [<ffffffff8101ae81>] copy_from_user_nmi+0x51/0xf0
      [  434.421258]  [<ffffffff8109a0d5>] ? sched_clock_local+0x25/0x90
      [  434.421258]  [<ffffffff8101b048>] perf_callchain_user+0x128/0x170
      [  434.421258]  [<ffffffff811154cd>] ? __perf_event_header__init_id+0xed/0x100
      [  434.421258]  [<ffffffff81116690>] perf_prepare_sample+0x200/0x280
      [  434.421258]  [<ffffffff81118da8>] __perf_event_overflow+0x1b8/0x290
      [  434.421258]  [<ffffffff81065240>] ? tg_shares_up+0x0/0x670
      [  434.421258]  [<ffffffff8104fe1a>] ? walk_tg_tree+0x6a/0xb0
      [  434.421258]  [<ffffffff81118f44>] perf_swevent_overflow+0xc4/0xf0
      [  434.421258]  [<ffffffff81119150>] do_perf_sw_event+0x1e0/0x250
      [  434.421258]  [<ffffffff81119204>] perf_tp_event+0x44/0x70
      [  434.421258]  [<ffffffff8105701f>] ftrace_profile_sched_block+0xdf/0x110
      [  434.421258]  [<ffffffff8106121d>] dequeue_entity+0x2ad/0x2d0
      [  434.421258]  [<ffffffff810614ec>] dequeue_task_fair+0x1c/0x60
      [  434.421258]  [<ffffffff8105818a>] dequeue_task+0x9a/0xb0
      [  434.421258]  [<ffffffff810581e2>] deactivate_task+0x42/0xe0
      [  434.421258]  [<ffffffff814bc019>] thread_return+0x191/0x808
      [  434.421258]  [<ffffffff81098a44>] ? switch_task_namespaces+0x24/0x60
      [  434.421258]  [<ffffffff8106f4c4>] do_exit+0x464/0x910
      [  434.421258]  [<ffffffff8106f9c8>] do_group_exit+0x58/0xd0
      [  434.421258]  [<ffffffff8106fa57>] sys_exit_group+0x17/0x20
      [  434.421258]  [<ffffffff8100b202>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1314693156-24131-1-git-send-email-avagin@openvz.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      20afc60f
  18. 14 8月, 2011 1 次提交
  19. 22 7月, 2011 1 次提交
  20. 15 7月, 2011 1 次提交
    • C
      perf, x86: P4 PMU - Introduce event alias feature · f9129870
      Cyrill Gorcunov 提交于
      Instead of hw_nmi_watchdog_set_attr() weak function
      and appropriate x86_pmu::hw_watchdog_set_attr() call
      we introduce even alias mechanism which allow us
      to drop this routines completely and isolate quirks
      of Netburst architecture inside P4 PMU code only.
      
      The main idea remains the same though -- to allow
      nmi-watchdog and perf top run simultaneously.
      
      Note the aliasing mechanism applies to generic
      PERF_COUNT_HW_CPU_CYCLES event only because arbitrary
      event (say passed as RAW initially) might have some
      additional bits set inside ESCR register changing
      the behaviour of event and we can't guarantee anymore
      that alias event will give the same result.
      
      P.S. Thanks a huge to Don and Steven for for testing
           and early review.
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      CC: Ingo Molnar <mingo@elte.hu>
      CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Stephane Eranian <eranian@google.com>
      CC: Lin Ming <ming.m.lin@intel.com>
      CC: Arnaldo Carvalho de Melo <acme@redhat.com>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20110708201712.GS23657@sunSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f9129870
  21. 01 7月, 2011 6 次提交
  22. 12 5月, 2011 1 次提交
  23. 27 4月, 2011 1 次提交
    • D
      perf, x86, nmi: Move LVT un-masking into irq handlers · 2bce5dac
      Don Zickus 提交于
      It was noticed that P4 machines were generating double NMIs for
      each perf event.  These extra NMIs lead to 'Dazed and confused'
      messages on the screen.
      
      I tracked this down to a P4 quirk that said the overflow bit had
      to be cleared before re-enabling the apic LVT mask.  My first
      attempt was to move the un-masking inside the perf nmi handler
      from before the chipset NMI handler to after.
      
      This broke Nehalem boxes that seem to like the unmasking before
      the counters themselves are re-enabled.
      
      In order to keep this change simple for 2.6.39, I decided to
      just simply move the apic LVT un-masking to the beginning of all
      the chipset NMI handlers, with the exception of Pentium4's to
      fix the double NMI issue.
      
      Later on we can move the un-masking to later in the handlers to
      save a number of 'extra' NMIs on those particular chipsets.
      
      I tested this change on a P4 machine, an AMD machine, a Nehalem
      box, and a core2quad box.  'perf top' worked correctly along
      with various other small 'perf record' runs.  Anything high
      stress breaks all the machines but that is a different problem.
      
      Thanks to various people for testing different versions of this
      patch.
      Reported-and-tested-by: NShaun Ruffell <sruffell@digium.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Link: http://lkml.kernel.org/r/1303900353-10242-1-git-send-email-dzickus@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      CC: Cyrill Gorcunov <gorcunov@gmail.com>
      2bce5dac
  24. 26 4月, 2011 1 次提交
  25. 22 4月, 2011 1 次提交
    • I
      x86, perf event: Turn off unstructured raw event access to offcore registers · b52c55c6
      Ingo Molnar 提交于
      Andi Kleen pointed out that the Intel offcore support patches were merged
      without user-space tool support to the functionality:
      
       |
       | The offcore_msr perf kernel code was merged into 2.6.39-rc*, but the
       | user space bits were not. This made it impossible to set the extra mask
       | and actually do the OFFCORE profiling
       |
      
      Andi submitted a preliminary patch for user-space support, as an
      extension to perf's raw event syntax:
      
       |
       | Some raw events -- like the Intel OFFCORE events -- support additional
       | parameters. These can be appended after a ':'.
       |
       | For example on a multi socket Intel Nehalem:
       |
       |    perf stat -e r1b7:20ff -a sleep 1
       |
       | Profile the OFFCORE_RESPONSE.ANY_REQUEST with event mask REMOTE_DRAM_0
       | that measures any access to DRAM on another socket.
       |
      
      But this kind of usability is absolutely unacceptable - users should not
      be expected to type in magic, CPU and model specific incantations to get
      access to useful hardware functionality.
      
      The proper solution is to expose useful offcore functionality via
      generalized events - that way users do not have to care which specific
      CPU model they are using, they can use the conceptual event and not some
      model specific quirky hexa number.
      
      We already have such generalization in place for CPU cache events,
      and it's all very extensible.
      
      "Offcore" events measure general DRAM access patters along various
      parameters. They are particularly useful in NUMA systems.
      
      We want to support them via generalized DRAM events: either as the
      fourth level of cache (after the last-level cache), or as a separate
      generalization category.
      
      That way user-space support would be very obvious, memory access
      profiling could be done via self-explanatory commands like:
      
        perf record -e dram ./myapp
        perf record -e dram-remote ./myapp
      
      ... to measure DRAM accesses or more expensive cross-node NUMA DRAM
      accesses.
      
      These generalized events would work on all CPUs and architectures that
      have comparable PMU features.
      
      ( Note, these are just examples: actual implementation could have more
        sophistication and more parameter - as long as they center around
        similarly simple usecases. )
      
      Now we do not want to revert *all* of the current offcore bits, as they
      are still somewhat useful for generic last-level-cache events, implemented
      in this commit:
      
        e994d7d2: perf: Fix LLC-* events on Intel Nehalem/Westmere
      
      But we definitely do not yet want to expose the unstructured raw events
      to user-space, until better generalization and usability is implemented
      for these hardware event features.
      
      ( Note: after generalization has been implemented raw offcore events can be
        supported as well: there can always be an odd event that is marginally
        useful but not useful enough to generalize. DRAM profiling is definitely
        *not* such a category so generalization must be done first. )
      
      Furthermore, PERF_TYPE_RAW access to these registers was not intended
      to go upstream without proper support - it was a side-effect of the above
      e994d7d2 commit, not mentioned in the changelog.
      
      As v2.6.39 is nearing release we go for the simplest approach: disable
      the PERF_TYPE_RAW offcore hack for now, before it escapes into a released
      kernel and becomes an ABI.
      
      Once proper structure is implemented for these hardware events and users
      are offered usable solutions we can revisit this issue.
      Reported-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1302658203-4239-1-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b52c55c6
  26. 19 4月, 2011 1 次提交