1. 09 8月, 2009 2 次提交
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
    • P
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra 提交于
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a659305
  2. 07 8月, 2009 1 次提交
    • P
      perf_counter: Fix double list iteration in per task precise stats · 1054598c
      Peter Zijlstra 提交于
      Brice Goglin reported this crash with per task precise stats:
      
      > I finally managed to test the threaded perfcounter statistics (thanks a
      > lot for implementing it). I am running 2.6.31-rc5 (with the AMD
      > magny-cours patches but I don't think they matter here). I am trying to
      > measure local/remote memory accesses per thread during the well-known
      > stream benchmark. It's compiled with OpenMP using 16 threads on a
      > quad-socket quad-core barcelona machine.
      >
      > Command line is:
      >  /mnt/scratch/bgoglin/cpunode/linux-2.6.31/tools/perf/perf record -f -s
      > -e r1000001e0 -e r1000002e0 -e r1000004e0 -e r1000008e0 ./stream
      >
      > It seems to work fine with a single -e <counter> on the command line
      > while it crashes when there are at least 2 of them.
      > It seems to work fine without -s as well.
      
      A silly copy-paste resulted in a messed up iteration which would
      cause the OOPS.
      Reported-by: NBrice Goglin <Brice.Goglin@inria.fr>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NBrice Goglin <Brice.Goglin@inria.fr>
      LKML-Reference: <1249574786.32113.550.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1054598c
  3. 02 8月, 2009 2 次提交
  4. 23 7月, 2009 5 次提交
  5. 18 7月, 2009 1 次提交
    • A
      perf_counter: Make sure we dont leak kernel memory to userspace · 413ee3b4
      Anton Blanchard 提交于
      There are a few places we are leaking tiny amounts of kernel
      memory to userspace. This happens when writing out strings
      because we always align the end to 64 bits.
      
      To avoid this we should always use an appropriately sized
      temporary buffer and ensure it is zeroed.
      
      Since d_path assembles the string from the end of the buffer
      backwards, we need to add 64 bits after the buffer to allow for
      alignment.
      
      We also need to copy arch_vma_name to the temporary buffer,
      because if we use it directly we may end up copying to
      userspace a number of bytes after the end of the string
      constant.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20090716104817.273972048@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      413ee3b4
  6. 13 7月, 2009 1 次提交
    • C
      perf_counter: Fix the tracepoint channel to perfcounters · d4d7d0b9
      Chris Wilson 提交于
      Fix a missed rename in EVENT_PROFILE support so that it gets
      built and allows tracepoint tracing from the 'perf' tool.
      
      Fix a typo in the (never before built & enabled) portion in
      perf_counter.c as well, and update that code to the
      attr.config changes as well.
      Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Ben Gamari <bgamari.foss@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d4d7d0b9
  7. 10 7月, 2009 1 次提交
  8. 07 7月, 2009 1 次提交
    • K
      Fix virt_to_phys() warnings · 5bfd7560
      Kevin Cernekee 提交于
      These warnings were observed on MIPS32 using 2.6.31-rc1 and gcc-4.2.0:
      
      mm/page_alloc.c: In function 'alloc_pages_exact':
      mm/page_alloc.c:1986: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast
      
      drivers/usb/mon/mon_bin.c: In function 'mon_alloc_buff':
      drivers/usb/mon/mon_bin.c:1264: warning: passing argument 1 of 'virt_to_phys' makes pointer from integer without a cast
      
      [akpm@linux-foundation.org: fix kernel/perf_counter.c too]
      Signed-off-by: NKevin Cernekee <cernekee@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5bfd7560
  9. 30 6月, 2009 1 次提交
  10. 26 6月, 2009 6 次提交
  11. 23 6月, 2009 3 次提交
  12. 20 6月, 2009 1 次提交
  13. 19 6月, 2009 2 次提交
    • P
      perf_counter: Close race in perf_lock_task_context() · b49a9e7e
      Peter Zijlstra 提交于
      perf_lock_task_context() is buggy because it can return a dead
      context.
      
      the RCU read lock in perf_lock_task_context() only guarantees
      the memory won't get freed, it doesn't guarantee the object is
      valid (in our case refcount > 0).
      
      Therefore we can return a locked object that can get freed the
      moment we release the rcu read lock.
      
      perf_pin_task_context() then increases the refcount and does an
      unlock on freed memory.
      
      That increased refcount will cause a double free, in case it
      started out with 0.
      
      Ammend this by including the get_ctx() functionality in
      perf_lock_task_context() (all users already did this later
      anyway), and return a NULL context when the found one is
      already dead.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b49a9e7e
    • P
      perf_counter: Simplify and fix task migration counting · e5289d4a
      Peter Zijlstra 提交于
      The task migrations counter was causing rare and hard to decypher
      memory corruptions under load. After a day of debugging and bisection
      we found that the problem was introduced with:
      
        3f731ca6: perf_counter: Fix cpu migration counter
      
      Turning them off fixes the crashes. Incidentally, the whole
      perf_counter_task_migration() logic can be done simpler as well,
      by injecting a proper sw-counter event.
      
      This cleanup also fixed the crashes. The precise failure mode is
      not completely clear yet, but we are clearly not unhappy about
      having a fix ;-)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e5289d4a
  14. 18 6月, 2009 1 次提交
    • P
      perf_counter: Add event overlow handling · 43a21ea8
      Peter Zijlstra 提交于
      Alternative method of mmap() data output handling that provides
      better overflow management and a more reliable data stream.
      
      Unlike the previous method, that didn't have any user->kernel
      feedback and relied on userspace keeping up, this method relies on
      userspace writing its last read position into the control page.
      
      It will ensure new output doesn't overwrite not-yet read events,
      new events for which there is no space left are lost and the
      overflow counter is incremented, providing exact event loss
      numbers.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      43a21ea8
  15. 15 6月, 2009 1 次提交
  16. 13 6月, 2009 2 次提交
  17. 12 6月, 2009 2 次提交
    • P
      perf_counter: Add forward/backward attribute ABI compatibility · 974802ea
      Peter Zijlstra 提交于
      Provide for means of extending the perf_counter_attr in a 'natural' way.
      
      We allow growing the structure by appending fields at the end by specifying
      the full structure size inside it.
      
      When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
      When an old kernel sees a larger (new) structure, it will verify the tail
      consists of 0s, otherwise fail.
      
      If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
      native attribe size back into the provided structure.
      
      Furthermore, add some attribute verification, so that we'll fail counter
      creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
      the __reserved fields).
      
      (This ABI detail is introduced while keeping the existing syscall ABI.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974802ea
    • P
      perf_counter: Remove PERF_TYPE_RAW special casing · 081fad86
      Peter Zijlstra 提交于
      The PERF_TYPE_RAW special case seems superfluous these days. Remove
      it and add it to the switch() stmt like the others.
      
      [ Impact: cleanup ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      081fad86
  18. 11 6月, 2009 7 次提交