1. 24 5月, 2009 5 次提交
    • P
      perf_counter: Change pctrl() behaviour · 082ff5a2
      Peter Zijlstra 提交于
      Instead of en/dis-abling all counters acting on a particular
      task, en/dis- able all counters we created.
      
      [ v2: fix crash on first counter enable ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.916937244@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      082ff5a2
    • P
      perf_counter: Simplify context cleanup · aa9c67f5
      Peter Zijlstra 提交于
      Use perf_counter_remove_from_context() to remove counters from
      the context.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.796275849@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      aa9c67f5
    • P
      perf_counter: Sanitize context locking · 682076ae
      Peter Zijlstra 提交于
      Ensure we're consistent with the context locks.
      
       context->mutex
         context->lock
           list_{add,del}_counter();
      
      so that either lock is sufficient to stabilize the context.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.618790733@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      682076ae
    • P
      perf_counter: Sanitize counter->mutex · fccc714b
      Peter Zijlstra 提交于
      s/counter->mutex/counter->child_mutex/ and make sure its only
      used to protect child_list.
      
      The usage in __perf_counter_exit_task() doesn't appear to be
      problematic since ctx->mutex also covers anything related to fd
      tear-down.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.533186528@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fccc714b
    • P
      perf_counter: Fix dynamic irq_period logging · e220d2dc
      Peter Zijlstra 提交于
      We call perf_adjust_freq() from perf_counter_task_tick() which
      is is called under the rq->lock causing lock recursion.
      However, it's no longer required to be called under the
      rq->lock, so remove it from under it.
      
      Also, fix up some related comments.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090523163012.476197912@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e220d2dc
  2. 22 5月, 2009 2 次提交
    • P
      perf_counter: Optimize context switch between identical inherited contexts · 564c2b21
      Paul Mackerras 提交于
      When monitoring a process and its descendants with a set of inherited
      counters, we can often get the situation in a context switch where
      both the old (outgoing) and new (incoming) process have the same set
      of counters, and their values are ultimately going to be added together.
      In that situation it doesn't matter which set of counters are used to
      count the activity for the new process, so there is really no need to
      go through the process of reading the hardware counters and updating
      the old task's counters and then setting up the PMU for the new task.
      
      This optimizes the context switch in this situation.  Instead of
      scheduling out the perf_counter_context for the old task and
      scheduling in the new context, we simply transfer the old context
      to the new task and keep using it without interruption.  The new
      context gets transferred to the old task.  This means that both
      tasks still have a valid perf_counter_context, so no special case
      is introduced when the old task gets scheduled in again, either on
      this CPU or another CPU.
      
      The equivalence of contexts is detected by keeping a pointer in
      each cloned context pointing to the context it was cloned from.
      To cope with the situation where a context is changed by adding
      or removing counters after it has been cloned, we also keep a
      generation number on each context which is incremented every time
      a context is changed.  When a context is cloned we take a copy
      of the parent's generation number, and two cloned contexts are
      equivalent only if they have the same parent and the same
      generation number.  In order that the parent context pointer
      remains valid (and is not reused), we increment the parent
      context's reference count for each context cloned from it.
      
      Since we don't have individual fds for the counters in a cloned
      context, the only thing that can make two clones of a given parent
      different after they have been cloned is enabling or disabling all
      counters with prctl.  To account for this, we keep a count of the
      number of enabled counters in each context.  Two contexts must have
      the same number of enabled counters to be considered equivalent.
      
      Here are some measurements of the context switch time as measured with
      the lat_ctx benchmark from lmbench, comparing the times obtained with
      and without this patch series:
      
      		-----Unmodified-----		With this patch series
      Counters:	none	2 HW	4H+4S	none	2 HW	4H+4S
      
      2 processes:
      Average		3.44	6.45	11.24	3.12	3.39	3.60
      St dev		0.04	0.04	0.13	0.05	0.17	0.19
      
      8 processes:
      Average		6.45	8.79	14.00	5.57	6.23	7.57
      St dev		1.27	1.04	0.88	1.42	1.46	1.42
      
      32 processes:
      Average		5.56	8.43	13.78	5.28	5.55	7.15
      St dev		0.41	0.47	0.53	0.54	0.57	0.81
      
      The numbers are the mean and standard deviation of 20 runs of
      lat_ctx.  The "none" columns are lat_ctx run directly without any
      counters.  The "2 HW" columns are with lat_ctx run under perfstat,
      counting cycles and instructions.  The "4H+4S" columns are lat_ctx run
      under perfstat with 4 hardware counters and 4 software counters
      (cycles, instructions, cache references, cache misses, task
      clock, context switch, cpu migrations, and page faults).
      
      [ Impact: performance optimization of counter context-switches ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10666.517218.332164@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      564c2b21
    • P
      perf_counter: Dynamically allocate tasks' perf_counter_context struct · a63eaf34
      Paul Mackerras 提交于
      This replaces the struct perf_counter_context in the task_struct with
      a pointer to a dynamically allocated perf_counter_context struct.  The
      main reason for doing is this is to allow us to transfer a
      perf_counter_context from one task to another when we do lazy PMU
      switching in a later patch.
      
      This has a few side-benefits: the task_struct becomes a little smaller,
      we save some memory because only tasks that have perf_counters attached
      get a perf_counter_context allocated for them, and we can remove the
      inclusion of <linux/perf_counter.h> in sched.h, meaning that we don't
      end up recompiling nearly everything whenever perf_counter.h changes.
      
      The perf_counter_context structures are reference-counted and freed
      when the last reference is dropped.  A context can have references
      from its task and the counters on its task.  Counters can outlive the
      task so it is possible that a context will be freed well after its
      task has exited.
      
      Contexts are allocated on fork if the parent had a context, or
      otherwise the first time that a per-task counter is created on a task.
      In the latter case, we set the context pointer in the task struct
      locklessly using an atomic compare-and-exchange operation in case we
      raced with some other task in creating a context for the subject task.
      
      This also removes the task pointer from the perf_counter struct.  The
      task pointer was not used anywhere and would make it harder to move a
      context from one task to another.  Anything that needed to know which
      task a counter was attached to was already using counter->ctx->task.
      
      The __perf_counter_init_context function moves up in perf_counter.c
      so that it can be called from find_get_context, and now initializes
      the refcount, but is otherwise unchanged.
      
      We were potentially calling list_del_counter twice: once from
      __perf_counter_exit_task when the task exits and once from
      __perf_counter_remove_from_context when the counter's fd gets closed.
      This adds a check in list_del_counter so it doesn't do anything if
      the counter has already been removed from the lists.
      
      Since perf_counter_task_sched_in doesn't do anything if the task doesn't
      have a context, and leaves cpuctx->task_ctx = NULL, this adds code to
      __perf_install_in_context to set cpuctx->task_ctx if necessary, i.e. in
      the case where the current task adds the first counter to itself and
      thus creates a context for itself.
      
      This also adds similar code to __perf_counter_enable to handle a
      similar situation which can arise when the counters have been disabled
      using prctl; that also leaves cpuctx->task_ctx = NULL.
      
      [ Impact: refactor counter context management to prepare for new feature ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18966.10075.781053.231153@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a63eaf34
  3. 21 5月, 2009 1 次提交
    • I
      perf_counter: Fix context removal deadlock · 34adc806
      Ingo Molnar 提交于
      Disable the PMU globally before removing a counter from a
      context. This fixes the following lockup:
      
      [22081.741922] ------------[ cut here ]------------
      [22081.746668] WARNING: at arch/x86/kernel/cpu/perf_counter.c:803 intel_pmu_handle_irq+0x9b/0x24e()
      [22081.755624] Hardware name: X8DTN
      [22081.758903] perfcounters: irq loop stuck!
      [22081.762985] Modules linked in:
      [22081.766136] Pid: 11082, comm: perf Not tainted 2.6.30-rc6-tip #226
      [22081.772432] Call Trace:
      [22081.774940]  <NMI>  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.781993]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.788368]  [<ffffffff8104505c>] ? warn_slowpath_common+0x77/0xa3
      [22081.794649]  [<ffffffff810450d3>] ? warn_slowpath_fmt+0x40/0x45
      [22081.800696]  [<ffffffff81019aed>] ? intel_pmu_handle_irq+0x9b/0x24e
      [22081.807080]  [<ffffffff814d1a72>] ? perf_counter_nmi_handler+0x3f/0x4a
      [22081.813751]  [<ffffffff814d2d09>] ? notifier_call_chain+0x58/0x86
      [22081.819951]  [<ffffffff8105b250>] ? notify_die+0x2d/0x32
      [22081.825392]  [<ffffffff814d1414>] ? do_nmi+0x8e/0x242
      [22081.830538]  [<ffffffff814d0f0a>] ? nmi+0x1a/0x20
      [22081.835342]  [<ffffffff8117e102>] ? selinux_file_free_security+0x0/0x1a
      [22081.842105]  [<ffffffff81018793>] ? x86_pmu_disable_counter+0x15/0x41
      [22081.848673]  <<EOE>>  [<ffffffff81018f3d>] ? x86_pmu_disable+0x86/0x103
      [22081.855512]  [<ffffffff8108fedd>] ? __perf_counter_remove_from_context+0x0/0xfe
      [22081.862926]  [<ffffffff8108fcbc>] ? counter_sched_out+0x30/0xce
      [22081.868909]  [<ffffffff8108ff36>] ? __perf_counter_remove_from_context+0x59/0xfe
      [22081.876382]  [<ffffffff8106808a>] ? smp_call_function_single+0x6c/0xe6
      [22081.882955]  [<ffffffff81091b96>] ? perf_release+0x86/0x14c
      [22081.888600]  [<ffffffff810c4c84>] ? __fput+0xe7/0x195
      [22081.893718]  [<ffffffff810c213e>] ? filp_close+0x5b/0x62
      [22081.899107]  [<ffffffff81046a70>] ? put_files_struct+0x64/0xc2
      [22081.905031]  [<ffffffff8104841a>] ? do_exit+0x1e2/0x6ef
      [22081.910360]  [<ffffffff814d0a60>] ? _spin_lock_irqsave+0x9/0xe
      [22081.916292]  [<ffffffff8104898e>] ? do_group_exit+0x67/0x93
      [22081.921953]  [<ffffffff810489cc>] ? sys_exit_group+0x12/0x16
      [22081.927759]  [<ffffffff8100baab>] ? system_call_fastpath+0x16/0x1b
      [22081.934076] ---[ end trace 3a3936ce3e1b4505 ]---
      
      And could potentially also fix the lockup reported by Marcelo Tosatti.
      
      Also, print more debug info in case of a detected lockup.
      
      [ Impact: fix lockup ]
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      34adc806
  4. 20 5月, 2009 6 次提交
    • P
      perf_counter: Optimize sched in/out of counters · afedadf2
      Peter Zijlstra 提交于
      Avoid a function call for !group counters by directly calling the counter
      function.
      
      [ Impact: micro-optimize the code ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.511933670@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      afedadf2
    • P
      perf_counter: Optimize disable of time based sw counters · b986d7ec
      Peter Zijlstra 提交于
      Currently we call hrtimer_cancel() unconditionally on disable of time based
      software counters. Avoid when possible.
      
      [ Impact: micro-optimize the code ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.388185031@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b986d7ec
    • P
      perf_counter: Log irq_period changes · 26b119bc
      Peter Zijlstra 提交于
      For the dynamic irq_period code, log whenever we change the period so that
      analyzing code can normalize the event flow.
      
      [ Impact: add new feature to allow more precise profiling ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.298769743@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      26b119bc
    • P
      perf_counter: Solve the rotate_ctx vs inherit race differently · d7b629a3
      Peter Zijlstra 提交于
      Instead of disabling RR scheduling of the counters, use a different list
      that does not get rotated to iterate the counters on inheritance.
      
      [ Impact: cleanup, optimization ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: John Kacur <jkacur@redhat.com>
      LKML-Reference: <20090520102553.237504544@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7b629a3
    • I
      perf_counter: fix counter inheritance race · c44d70a3
      Ingo Molnar 提交于
      Context rotation should not occur when we are in the middle of
      walking the counter list when inheriting counters ...
      
      [ Impact: fix occasionally incorrect perf stat results ]
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c44d70a3
    • I
      perf_counter: fix counter freeing logic · 33b2fb30
      Ingo Molnar 提交于
      Fix counter lifetime bugs which explain the crashes reported by
      Marcelo Tosatti and Arnaldo Carvalho de Melo.
      
      The new rule is: flushing + freeing is only done for a task's
      own counters, never for other tasks.
      
      [ Impact: fix crashes/lockups with inherited counters ]
      Reported-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Reported-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      33b2fb30
  5. 17 5月, 2009 1 次提交
    • P
      perf_counter: Fix inheritance cleanup code · 8bc20959
      Peter Zijlstra 提交于
      Clean up code that open-coded the list_{add,del}_counter() code in
      __perf_counter_exit_task() which consequently diverged. This could
      lead to software counter crashes.
      
      Also, fold the ctx->nr_counter inc/dec into those functions and clean
      up some of the related code.
      
      [ Impact: fix potential sw counter crash, cleanup ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bc20959
  6. 15 5月, 2009 7 次提交
    • P
      perf_counter: allow arch to supply event misc flags and instruction pointer · 9d23a90a
      Paul Mackerras 提交于
      At present the values we put in overflow events for the misc
      flags indicating processor mode and the instruction pointer are
      obtained using the standard user_mode() and
      instruction_pointer() functions. Those functions tell you where
      the performance monitor interrupt was taken, which might not be
      exactly where the counter overflow occurred, for example
      because interrupts were disabled at the point where the
      overflow occurred, or because the processor had many
      instructions in flight and chose to complete some more
      instructions beyond the one that caused the counter overflow.
      
      Some architectures (e.g. powerpc) can supply more precise
      information about where the counter overflow occurred and the
      processor mode at that point.  This introduces new functions,
      perf_misc_flags() and perf_instruction_pointer(), which arch
      code can override to provide more precise information if
      available.  They have default implementations which are
      identical to the existing code.
      
      This also adds a new misc flag value,
      PERF_EVENT_MISC_HYPERVISOR, for the case where a counter
      overflow occurred in the hypervisor.  We encode the processor
      mode in the 2 bits previously used to indicate user or kernel
      mode; the values for user and kernel mode are unchanged and
      hypervisor mode is indicated by both bits being set.
      
      [ Impact: generalize perfcounter core facilities ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <18956.1272.818511.561835@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9d23a90a
    • P
      perf_counter: frequency based adaptive irq_period, 32-bit fix · 2e569d36
      Peter Zijlstra 提交于
      fix:
      
        kernel/built-in.o: In function `perf_counter_alloc':
        perf_counter.c:(.text+0x7ddc7): undefined reference to `__udivdi3'
      
      [ Impact: build fix on 32-bit systems ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <1242394667.6642.1887.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2e569d36
    • P
      perf_counter: frequency based adaptive irq_period · 60db5e09
      Peter Zijlstra 提交于
      Instead of specifying the irq_period for a counter, provide a target interrupt
      frequency and dynamically adapt the irq_period to match this frequency.
      
      [ Impact: new perf-counter attribute/feature ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20090515132018.646195868@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      60db5e09
    • P
      perf_counter: per user mlock gift · 789f90fc
      Peter Zijlstra 提交于
      Instead of a per-process mlock gift for perf-counters, use a
      per-user gift so that there is less of a DoS potential.
      
      [ Impact: allow less worst-case unprivileged memory consumption ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20090515132018.496182835@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      789f90fc
    • P
      perf_counter: remove perf_disable/enable exports · 548e1ddf
      Peter Zijlstra 提交于
      Now that ACPI idle doesn't use it anymore, remove the exports.
      
      [ Impact: remove dead code/data ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <20090515132018.429826617@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      548e1ddf
    • P
      perf_counter: Rework the perf counter disable/enable · 9e35ad38
      Peter Zijlstra 提交于
      The current disable/enable mechanism is:
      
      	token = hw_perf_save_disable();
      	...
      	/* do bits */
      	...
      	hw_perf_restore(token);
      
      This works well, provided that the use nests properly. Except we don't.
      
      x86 NMI/INT throttling has non-nested use of this, breaking things. Therefore
      provide a reference counter disable/enable interface, where the first disable
      disables the hardware, and the last enable enables the hardware again.
      
      [ Impact: refactor, simplify the PMU disable/enable logic ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9e35ad38
    • P
      perf_counter: Fix perf_output_copy() WARN to account for overflow · 53020fe8
      Peter Zijlstra 提交于
      The simple reservation test in perf_output_copy() failed to take
      unsigned int overflow into account, fix this.
      
      [ Impact: fix false positive warning with more than 4GB of profiling data ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      53020fe8
  7. 12 5月, 2009 1 次提交
    • P
      perf_counter: call hw_perf_save_disable/restore around group_sched_in · e758a33d
      Paul Mackerras 提交于
      I noticed that when enabling a group via the PERF_COUNTER_IOC_ENABLE
      ioctl on the group leader, the counters weren't enabled and counting
      immediately on return from the ioctl, but did start counting a little
      while later (presumably after a context switch).
      
      The reason was that __perf_counter_enable calls group_sched_in which
      calls hw_perf_group_sched_in, which on powerpc assumes that the caller
      has called hw_perf_save_disable already.  Until commit 46d686c6
      ("perf_counter: put whole group on when enabling group leader") it was
      true that all callers of group_sched_in had called
      hw_perf_save_disable first, and the powerpc hw_perf_group_sched_in
      relies on that (there isn't an x86 version).
      
      This fixes the problem by putting calls to hw_perf_save_disable /
      hw_perf_restore around the calls to group_sched_in and
      counter_sched_in in __perf_counter_enable.  Having the calls to
      hw_perf_save_disable/restore around the counter_sched_in call is
      harmless and makes this call consistent with the other call sites
      of counter_sched_in, which have all called hw_perf_save_disable first.
      
      [ Impact: more precise counter group disable/enable functionality ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18953.25733.53359.147452@cargo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e758a33d
  8. 11 5月, 2009 3 次提交
    • P
      perf_counter: call atomic64_set for counter->count · 615a3f1e
      Paul Mackerras 提交于
      A compile warning triggered because we are calling
      atomic_set(&counter->count). But since counter->count
      is an atomic64_t, we have to use atomic64_set.
      
      So the count can be set short, resulting in the reset ioctl
      only resetting the low word.
      
      [ Impact: clear counter properly during the reset ioctl ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18951.48285.270311.981806@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      615a3f1e
    • P
      perf_counter: don't count scheduler ticks as context switches · a08b159f
      Paul Mackerras 提交于
      The context-switch software counter gives inflated values at present
      because each scheduler tick and each process-wide counter
      enable/disable prctl gets counted as a context switch.
      
      This happens because perf_counter_task_tick, perf_counter_task_disable
      and perf_counter_task_enable all call perf_counter_task_sched_out,
      which calls perf_swcounter_event to record a context switch event.
      
      This fixes it by introducing a variant of perf_counter_task_sched_out
      with two underscores in front for internal use within the perf_counter
      code, and makes perf_counter_task_{tick,disable,enable} call it.  This
      variant doesn't record a context switch event, and takes a struct
      perf_counter_context *.  This adds the new variant rather than
      changing the behaviour or interface of perf_counter_task_sched_out
      because that is called from other code.
      
      [ Impact: fix inflated context-switch event counts ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18951.48034.485580.498953@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a08b159f
    • P
      perf_counter: Put whole group on when enabling group leader · 6751b71e
      Paul Mackerras 提交于
      Currently, if you have a group where the leader is disabled and there
      are siblings that are enabled, and then you enable the leader, we only
      put the leader on the PMU, and not its enabled siblings.  This is
      incorrect, since the enabled group members should be all on or all off
      at any given point.
      
      This fixes it by adding a call to group_sched_in in
      __perf_counter_enable in the case where we're enabling a group leader.
      
      To avoid the need for a forward declaration this also moves
      group_sched_in up before __perf_counter_enable.  The actual content of
      group_sched_in is unchanged by this patch.
      
      [ Impact: fix bug in counter enable code ]
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <18951.34946.451546.691693@drongo.ozlabs.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6751b71e
  9. 09 5月, 2009 4 次提交
    • P
      perf_counter: add PERF_RECORD_CPU · f370e1e2
      Peter Zijlstra 提交于
      Allow recording the CPU number the event was generated on.
      
      RFC: this leaves a u32 as reserved, should we fill in the
           node_id() there, or leave this open for future extention,
           as userspace can already easily do the cpu->node mapping
           if needed.
      
      [ Impact: extend perfcounter output record format ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090508170029.008627711@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f370e1e2
    • P
      perf_counter: add PERF_RECORD_CONFIG · a85f61ab
      Peter Zijlstra 提交于
      Much like CONFIG_RECORD_GROUP records the hw_event.config to
      identify the values, allow to record this for all counters.
      
      [ Impact: extend perfcounter output record format ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090508170028.923228280@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a85f61ab
    • P
      perf_counter: rework ioctl()s · 3df5edad
      Peter Zijlstra 提交于
      Corey noticed that ioctl()s on grouped counters didn't work on
      the whole group. This extends the ioctl() interface to take a
      second argument that is interpreted as a flags field. We then
      provide PERF_IOC_FLAG_GROUP to toggle the behaviour.
      
      Having this flag gives the greatest flexibility, allowing you
      to individually enable/disable/reset counters in a group, or
      all together.
      
      [ Impact: fix group counter enable/disable semantics ]
      Reported-by: NCorey Ashford <cjashfor@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <20090508170028.837558214@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3df5edad
    • P
      perf_counter: optimize perf_counter_task_tick() · 7fc23a53
      Peter Zijlstra 提交于
      perf_counter_task_tick() does way too much work to find out
      there's nothing to do. Provide an easy short-circuit for the
      normal case where there are no counters on the system.
      
      [ Impact: micro-optimization ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090508170028.750619201@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7fc23a53
  10. 06 5月, 2009 5 次提交
  11. 05 5月, 2009 3 次提交
  12. 01 5月, 2009 1 次提交
    • P
      perf_counter: fix race in perf_output_* · c33a0bc4
      Peter Zijlstra 提交于
      When two (or more) contexts output to the same buffer, it is possible
      to observe half written output.
      
      Suppose we have CPU0 doing perf_counter_mmap(), CPU1 doing
      perf_counter_overflow(). If CPU1 does a wakeup and exposes head to
      user-space, then CPU2 can observe the data CPU0 is still writing.
      
      [ Impact: fix occasionally corrupted profiling records ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      LKML-Reference: <20090501102533.007821627@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c33a0bc4
  13. 30 4月, 2009 1 次提交