1. 04 2月, 2015 3 次提交
  2. 28 1月, 2015 1 次提交
    • P
      perf: Tighten (and fix) the grouping condition · c3c87e77
      Peter Zijlstra 提交于
      The fix from 9fc81d87 ("perf: Fix events installation during
      moving group") was incomplete in that it failed to recognise that
      creating a group with events for different CPUs is semantically
      broken -- they cannot be co-scheduled.
      
      Furthermore, it leads to real breakage where, when we create an event
      for CPU Y and then migrate it to form a group on CPU X, the code gets
      confused where the counter is programmed -- triggered in practice
      as well by me via the perf fuzzer.
      
      Fix this by tightening the rules for creating groups. Only allow
      grouping of counters that can be co-scheduled in the same context.
      This means for the same task and/or the same cpu.
      
      Fixes: 9fc81d87 ("perf: Fix events installation during moving group")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c3c87e77
  3. 14 1月, 2015 1 次提交
    • P
      perf: Avoid horrible stack usage · 86038c5e
      Peter Zijlstra (Intel) 提交于
      Both Linus (most recent) and Steve (a while ago) reported that perf
      related callbacks have massive stack bloat.
      
      The problem is that software events need a pt_regs in order to
      properly report the event location and unwind stack. And because we
      could not assume one was present we allocated one on stack and filled
      it with minimal bits required for operation.
      
      Now, pt_regs is quite large, so this is undesirable. Furthermore it
      turns out that most sites actually have a pt_regs pointer available,
      making this even more onerous, as the stack space is pointless waste.
      
      This patch addresses the problem by observing that software events
      have well defined nesting semantics, therefore we can use static
      per-cpu storage instead of on-stack.
      
      Linus made the further observation that all but the scheduler callers
      of perf_sw_event() have a pt_regs available, so we change the regular
      perf_sw_event() to require a valid pt_regs (where it used to be
      optional) and add perf_sw_event_sched() for the scheduler.
      
      We have a scheduler specific call instead of a more generic _noregs()
      like construct because we can assume non-recursion from the scheduler
      and thereby simplify the code further (_noregs would have to put the
      recursion context call inline in order to assertain which __perf_regs
      element to use).
      
      One last note on the implementation of perf_trace_buf_prepare(); we
      allow .regs = NULL for those cases where we already have a pt_regs
      pointer available and do not need another.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Javi Merino <javi.merino@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Petr Mladek <pmladek@suse.cz>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      86038c5e
  4. 09 1月, 2015 1 次提交
  5. 11 12月, 2014 1 次提交
    • J
      perf: Fix events installation during moving group · 9fc81d87
      Jiri Olsa 提交于
      We allow PMU driver to change the cpu on which the event
      should be installed to. This happened in patch:
      
        e2d37cd2 ("perf: Allow the PMU driver to choose the CPU on which to install events")
      
      This patch also forces all the group members to follow
      the currently opened events cpu if the group happened
      to be moved.
      
      This and the change of event->cpu in perf_install_in_context()
      function introduced in:
      
        0cda4c02 ("perf: Introduce perf_pmu_migrate_context()")
      
      forces group members to change their event->cpu,
      if the currently-opened-event's PMU changed the cpu
      and there is a group move.
      
      Above behaviour causes problem for breakpoint events,
      which uses event->cpu to touch cpu specific data for
      breakpoints accounting. By changing event->cpu, some
      breakpoints slots were wrongly accounted for given
      cpu.
      
      Vinces's perf fuzzer hit this issue and caused following
      WARN on my setup:
      
         WARNING: CPU: 0 PID: 20214 at arch/x86/kernel/hw_breakpoint.c:119 arch_install_hw_breakpoint+0x142/0x150()
         Can't find any breakpoint slot
         [...]
      
      This patch changes the group moving code to keep the event's
      original cpu.
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1418243031-20367-3-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9fc81d87
  6. 20 11月, 2014 1 次提交
  7. 16 11月, 2014 3 次提交
  8. 28 10月, 2014 1 次提交
    • P
      perf: Fix and clean up initialization of pmu::event_idx · c719f560
      Peter Zijlstra 提交于
      Andy reported that the current state of event_idx is rather confused.
      So remove all but the x86_pmu implementation and change the default to
      return 0 (the safe option).
      Reported-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Cody P Schafer <cody@linux.vnet.ibm.com>
      Cc: Cody P Schafer <dev@codyps.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
      Cc: Himangi Saraogi <himangi774@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: sukadev@linux.vnet.ibm.com <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Huth <thuth@linux.vnet.ibm.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux390@de.ibm.com
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: linux-s390@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c719f560
  9. 03 10月, 2014 3 次提交
  10. 24 9月, 2014 3 次提交
  11. 19 9月, 2014 1 次提交
  12. 16 9月, 2014 1 次提交
  13. 09 9月, 2014 2 次提交
  14. 27 8月, 2014 1 次提交
  15. 24 8月, 2014 2 次提交
  16. 20 8月, 2014 1 次提交
  17. 13 8月, 2014 3 次提交
  18. 16 7月, 2014 3 次提交
    • J
      perf: Add vm_ops->name call for mmap event name retrieval · fbe26abe
      Jiri Olsa 提交于
      The following patch added another way to get mmap name: 78d683e8
      ("mm, fs: Add vm_ops->name as an alternative to arch_vma_name")
      
      The vdso vma mapping already switch to this and we no longer get vdso
      name via arch_vma_name function. Adding this way to the perf mmap
      event name retrieval code.
      
      Caught this via perf test:
      
        $ sudo ./perf test -v 7
         7: Validate PERF_RECORD_* events & perf_sample fields     :
        --- start ---
      
      SNIP
      
        PERF_RECORD_MMAP for [vdso] missing!
        test child finished with 255
        ---- end ----
        Validate PERF_RECORD_* events & perf_sample fields: FAILED!
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1405353439-14211-1-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbe26abe
    • P
      perf: Fix lockdep warning on process exit · 4a1c0f26
      Peter Zijlstra 提交于
      Sasha Levin reported:
      
      > While fuzzing with trinity inside a KVM tools guest running the latest -next
      > kernel I've stumbled on the following spew:
      >
      > ======================================================
      > [ INFO: possible circular locking dependency detected ]
      > 3.15.0-next-20140613-sasha-00026-g6dd125d-dirty #654 Not tainted
      > -------------------------------------------------------
      > trinity-c578/9725 is trying to acquire lock:
      > (&(&pool->lock)->rlock){-.-...}, at: __queue_work (kernel/workqueue.c:1346)
      >
      > but task is already holding lock:
      > (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      > which lock already depends on the new lock.
      
      > 1 lock held by trinity-c578/9725:
      > #0: (&ctx->lock){-.....}, at: perf_event_exit_task (kernel/events/core.c:7471 kernel/events/core.c:7533)
      >
      >  Call Trace:
      >  dump_stack (lib/dump_stack.c:52)
      >  print_circular_bug (kernel/locking/lockdep.c:1216)
      >  __lock_acquire (kernel/locking/lockdep.c:1840 kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131 kernel/locking/lockdep.c:3182)
      >  lock_acquire (./arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602)
      >  _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151)
      >  __queue_work (kernel/workqueue.c:1346)
      >  queue_work_on (kernel/workqueue.c:1424)
      >  free_object (lib/debugobjects.c:209)
      >  __debug_check_no_obj_freed (lib/debugobjects.c:715)
      >  debug_check_no_obj_freed (lib/debugobjects.c:727)
      >  kmem_cache_free (mm/slub.c:2683 mm/slub.c:2711)
      >  free_task (kernel/fork.c:221)
      >  __put_task_struct (kernel/fork.c:250)
      >  put_ctx (include/linux/sched.h:1855 kernel/events/core.c:898)
      >  perf_event_exit_task (kernel/events/core.c:907 kernel/events/core.c:7478 kernel/events/core.c:7533)
      >  do_exit (kernel/exit.c:766)
      >  do_group_exit (kernel/exit.c:884)
      >  get_signal_to_deliver (kernel/signal.c:2347)
      >  do_signal (arch/x86/kernel/signal.c:698)
      >  do_notify_resume (arch/x86/kernel/signal.c:751)
      >  int_signal (arch/x86/kernel/entry_64.S:600)
      
      Urgh.. so the only way I can make that happen is through:
      
        perf_event_exit_task_context()
          raw_spin_lock(&child_ctx->lock);
          unclone_ctx(child_ctx)
            put_ctx(ctx->parent_ctx);
          raw_spin_unlock_irqrestore(&child_ctx->lock);
      
      And we can avoid this by doing the change below.
      
      I can't immediately see how this changed recently, but given that you
      say it's easy to reproduce, lets fix this.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140623141242.GB19860@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4a1c0f26
    • P
      perf: Revert ("perf: Always destroy groups on exit") · 1903d50c
      Peter Zijlstra 提交于
      Vince reported that commit 15a2d4de ("perf: Always destroy groups
      on exit") causes a regression with grouped events. In particular his
      read_group_attached.c test fails.
      
        https://github.com/deater/perf_event_tests/blob/master/tests/bugs/read_group_attached.c
      
      Because of the context switch optimization in
      perf_event_context_sched_out() the 'original' event may end up in the
      child process and when that exits the change in the patch in question
      destroys the actual grouping.
      
      Therefore revert that change and only destroy inherited groups.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-zedy3uktcp753q8fw8dagx7a@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1903d50c
  19. 05 7月, 2014 1 次提交
  20. 02 7月, 2014 1 次提交
    • J
      perf: Do not allow optimized switch for non-cloned events · 1f9a7268
      Jiri Olsa 提交于
      The context check in perf_event_context_sched_out allows
      non-cloned context to be part of the optimized schedule
      out switch.
      
      This could move non-cloned context into another workload
      child. Once this child exits, the context is closed and
      leaves all original (parent) events in closed state.
      
      Any other new cloned event will have closed state and not
      measure anything. And probably causing other odd bugs.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: <stable@vger.kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1403598026-2310-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1f9a7268
  21. 09 6月, 2014 2 次提交
  22. 06 6月, 2014 2 次提交
    • A
      perf: Differentiate exec() and non-exec() comm events · 82b89778
      Adrian Hunter 提交于
      perf tools like 'perf report' can aggregate samples by comm strings,
      which generally works.  However, there are other potential use-cases.
      For example, to pair up 'calls' with 'returns' accurately (from branch
      events like Intel BTS) it is necessary to identify whether the process
      has exec'd.  Although a comm event is generated when an 'exec' happens
      it is also generated whenever the comm string is changed on a whim
      (e.g. by prctl PR_SET_NAME).  This patch adds a flag to the comm event
      to differentiate one case from the other.
      
      In order to determine whether the kernel supports the new flag, a
      selection bit named 'exec' is added to struct perf_event_attr.  The
      bit does nothing but will cause perf_event_open() to fail if the bit
      is set on kernels that do not have it defined.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/537D9EBE.7030806@intel.com
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      82b89778
    • P
      perf: Fix perf_event_comm() vs. exec() assumption · e041e328
      Peter Zijlstra 提交于
      perf_event_comm() assumes that set_task_comm() is only called on
      exec(), and in particular that its only called on current.
      
      Neither are true, as Dave reported a WARN triggered by set_task_comm()
      being called on !current.
      
      Separate the exec() hook from the comm hook.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140521153219.GH5226@laptop.programming.kicks-ass.net
      [ Build fix. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e041e328
  23. 05 6月, 2014 2 次提交