1. 31 3月, 2016 1 次提交
  2. 21 3月, 2016 1 次提交
    • P
      perf/core: Fix Undefined behaviour in rb_alloc() · 8184059e
      Peter Zijlstra 提交于
      Sasha reported:
      
       [ 3494.030114] UBSAN: Undefined behaviour in kernel/events/ring_buffer.c:685:22
       [ 3494.030647] shift exponent -1 is negative
      
      Andrey spotted that this is because:
      
        It happens if nr_pages = 0:
           rb->page_order = ilog2(nr_pages);
      
      Fix it by making both assignments conditional on nr_pages; since
      otherwise they should both be 0 anyway, and will be because of the
      kzalloc() used to allocate the structure.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Reported-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/20160129141751.GA407@worktopSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8184059e
  3. 22 1月, 2016 1 次提交
    • A
      perf: Synchronously free aux pages in case of allocation failure · 45c815f0
      Alexander Shishkin 提交于
      We are currently using asynchronous deallocation in the error path in
      AUX mmap code, which is unnecessary and also presents a problem for users
      that wish to probe for the biggest possible buffer size they can get:
      they'll get -EINVAL on all subsequent attemts to allocate a smaller
      buffer before the asynchronous deallocation callback frees up the pages
      from the previous unsuccessful attempt.
      
      Currently, gdb does that for allocating AUX buffers for Intel PT traces.
      More specifically, overwrite mode of AUX pmus that don't support hardware
      sg (some implementations of Intel PT, for instance) is limited to only
      one contiguous high order allocation for its buffer and there is no way
      of knowing its size without trying.
      
      This patch changes error path freeing to be synchronous as there won't
      be any contenders for the AUX pages at that point.
      Reported-by: NMarkus Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/1453216469-9509-1-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45c815f0
  4. 23 11月, 2015 1 次提交
    • P
      treewide: Remove old email address · 90eec103
      Peter Zijlstra 提交于
      There were still a number of references to my old Red Hat email
      address in the kernel source. Remove these while keeping the
      Red Hat copyright notices intact.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      90eec103
  5. 04 11月, 2015 1 次提交
    • L
      atomic: remove all traces of READ_ONCE_CTRL() and atomic*_read_ctrl() · 105ff3cb
      Linus Torvalds 提交于
      This seems to be a mis-reading of how alpha memory ordering works, and
      is not backed up by the alpha architecture manual.  The helper functions
      don't do anything special on any other architectures, and the arguments
      that support them being safe on other architectures also argue that they
      are safe on alpha.
      
      Basically, the "control dependency" is between a previous read and a
      subsequent write that is dependent on the value read.  Even if the
      subsequent write is actually done speculatively, there is no way that
      such a speculative write could be made visible to other cpu's until it
      has been committed, which requires validating the speculation.
      
      Note that most weakely ordered architectures (very much including alpha)
      do not guarantee any ordering relationship between two loads that depend
      on each other on a control dependency:
      
          read A
          if (val == 1)
              read B
      
      because the conditional may be predicted, and the "read B" may be
      speculatively moved up to before reading the value A.  So we require the
      user to insert a smp_rmb() between the two accesses to be correct:
      
          read A;
          if (A == 1)
              smp_rmb()
              read B
      
      Alpha is further special in that it can break that ordering even if the
      *address* of B depends on the read of A, because the cacheline that is
      read later may be stale unless you have a memory barrier in between the
      pointer read and the read of the value behind a pointer:
      
          read ptr
          read offset(ptr)
      
      whereas all other weakly ordered architectures guarantee that the data
      dependency (as opposed to just a control dependency) will order the two
      accesses.  As a result, alpha needs a "smp_read_barrier_depends()" in
      between those two reads for them to be ordered.
      
      The coontrol dependency that "READ_ONCE_CTRL()" and "atomic_read_ctrl()"
      had was a control dependency to a subsequent *write*, however, and
      nobody can finalize such a subsequent write without having actually done
      the read.  And were you to write such a value to a "stale" cacheline
      (the way the unordered reads came to be), that would seem to lose the
      write entirely.
      
      So the things that make alpha able to re-order reads even more
      aggressively than other weak architectures do not seem to be relevant
      for a subsequent write.  Alpha memory ordering may be strange, but
      there's no real indication that it is *that* strange.
      
      Also, the alpha architecture reference manual very explicitly talks
      about the definition of "Dependence Constraints" in section 5.6.1.7,
      where a preceding read dominates a subsequent write.
      
      Such a dependence constraint admittedly does not impose a BEFORE (alpha
      architecture term for globally visible ordering), but it does guarantee
      that there can be no "causal loop".  I don't see how you could avoid
      such a loop if another cpu could see the stored value and then impact
      the value of the first read.  Put another way: the read and the write
      could not be seen as being out of order wrt other cpus.
      
      So I do not see how these "x_ctrl()" functions can currently be necessary.
      
      I may have to eat my words at some point, but in the absense of clear
      proof that alpha actually needs this, or indeed even an explanation of
      how alpha could _possibly_ need it, I do not believe these functions are
      called for.
      
      And if it turns out that alpha really _does_ need a barrier for this
      case, that barrier still should not be "smp_read_barrier_depends()".
      We'd have to make up some new speciality barrier just for alpha, along
      with the documentation for why it really is necessary.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul E McKenney <paulmck@us.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      105ff3cb
  6. 12 8月, 2015 2 次提交
  7. 06 7月, 2015 1 次提交
  8. 28 5月, 2015 1 次提交
    • P
      smp: Make control dependencies work on Alpha, improve documentation · 5af4692a
      Paul E. McKenney 提交于
      The current formulation of control dependencies fails on DEC Alpha,
      which does not respect dependencies of any kind unless an explicit
      memory barrier is provided.  This means that the current fomulation of
      control dependencies fails on Alpha.  This commit therefore creates a
      READ_ONCE_CTRL() that has the same overhead on non-Alpha systems, but
      causes Alpha to produce the needed ordering.  This commit also applies
      READ_ONCE_CTRL() to the one known use of control dependencies.
      
      Use of READ_ONCE_CTRL() also has the beneficial effect of adding a bit
      of self-documentation to control dependencies.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      5af4692a
  9. 27 5月, 2015 1 次提交
  10. 02 4月, 2015 6 次提交
    • A
      perf: Add wakeup watermark control to the AUX area · 1a594131
      Alexander Shishkin 提交于
      When AUX area gets a certain amount of new data, we want to wake up
      userspace to collect it. This adds a new control to specify how much
      data will cause a wakeup. This is then passed down to pmu drivers via
      output handle's "wakeup" field, so that the driver can find the nearest
      point where it can generate an interrupt.
      
      We repurpose __reserved_2 in the event attribute for this, even though
      it was never checked to be zero before, aux_watermark will only matter
      for new AUX-aware code, so the old code should still be fine.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-10-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1a594131
    • A
      perf: Support overwrite mode for the AUX area · 2023a0d2
      Alexander Shishkin 提交于
      This adds support for overwrite mode in the AUX area, which means "keep
      collecting data till you're stopped", turning AUX area into a circular
      buffer, where new data overwrites old data. It does not depend on data
      buffer's overwrite mode, so that it doesn't lose sideband data that is
      instrumental for processing AUX data.
      
      Overwrite mode is enabled at mapping AUX area read only. Even though
      aux_tail in the buffer's user page might be user writable, it will be
      ignored in this mode.
      
      A PERF_RECORD_AUX with PERF_AUX_FLAG_OVERWRITE set is written to the perf
      data stream every time an event writes new data to the AUX area. The pmu
      driver might not be able to infer the exact beginning of the new data in
      each snapshot, some drivers will only provide the tail, which is
      aux_offset + aux_size in the AUX record. Consumer has to be able to tell
      the new data from the old one, for example, by means of time stamps if
      such are provided in the trace.
      
      Consumer is also responsible for disabling any events that might write
      to the AUX area (thus potentially racing with the consumer) before
      collecting the data.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-9-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2023a0d2
    • A
      perf: Add API for PMUs to write to the AUX area · fdc26706
      Alexander Shishkin 提交于
      For pmus that wish to write data to ring buffer's AUX area, provide
      perf_aux_output_{begin,end}() calls to initiate/commit data writes,
      similarly to perf_output_{begin,end}. These also use the same output
      handle structure. Also, similarly to software counterparts, these
      will direct inherited events' output to parents' ring buffers.
      
      After the perf_aux_output_begin() returns successfully, handle->size
      is set to the maximum amount of data that can be written wrt aux_tail
      pointer, so that no data that the user hasn't seen will be overwritten,
      therefore this should always be called before hardware writing is
      enabled. On success, this will return the pointer to pmu driver's
      private structure allocated for this aux area by pmu::setup_aux. Same
      pointer can also be retrieved using perf_get_aux() while hardware
      writing is enabled.
      
      PMU driver should pass the actual amount of data written as a parameter
      to perf_aux_output_end(). All hardware writes should be completed and
      visible before this one is called.
      
      Additionally, perf_aux_output_skip() will adjust output handle and
      aux_head in case some part of the buffer has to be skipped over to
      maintain hardware's alignment constraints.
      
      Nested writers are forbidden and guards are in place to catch such
      attempts.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-8-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdc26706
    • A
      perf: Add a capability for AUX_NO_SG pmus to do software double buffering · 6a279230
      Alexander Shishkin 提交于
      For pmus that don't support scatter-gather for AUX data in hardware, it
      might still make sense to implement software double buffering to avoid
      losing data while the user is reading data out. For this purpose, add
      a pmu capability that guarantees multiple high-order chunks for AUX buffer,
      so that the pmu driver can do switchover tricks.
      
      To make use of this feature, add PERF_PMU_CAP_AUX_SW_DOUBLEBUF to your
      pmu's capability mask. This will make the ring buffer AUX allocation code
      ensure that the biggest high order allocation for the aux buffer pages is
      no bigger than half of the total requested buffer size, thus making sure
      that the buffer has at least two high order allocations.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-5-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6a279230
    • A
      perf: Support high-order allocations for AUX space · 0a4e38e6
      Alexander Shishkin 提交于
      Some pmus (such as BTS or Intel PT without multiple-entry ToPA capability)
      don't support scatter-gather and will prefer larger contiguous areas for
      their output regions.
      
      This patch adds a new pmu capability to request higher order allocations.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-4-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0a4e38e6
    • P
      perf: Add AUX area to ring buffer for raw data streams · 45bfb2e5
      Peter Zijlstra 提交于
      This patch introduces "AUX space" in the perf mmap buffer, intended for
      exporting high bandwidth data streams to userspace, such as instruction
      flow traces.
      
      AUX space is a ring buffer, defined by aux_{offset,size} fields in the
      user_page structure, and read/write pointers aux_{head,tail}, which abide
      by the same rules as data_* counterparts of the main perf buffer.
      
      In order to allocate/mmap AUX, userspace needs to set up aux_offset to
      such an offset that will be greater than data_offset+data_size and
      aux_size to be the desired buffer size. Both need to be page aligned.
      Then, same aux_offset and aux_size should be passed to mmap() call and
      if everything adds up, you should have an AUX buffer as a result.
      
      Pages that are mapped into this buffer also come out of user's mlock
      rlimit plus perf_event_mlock_kb allowance.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kaixu Xia <kaixu.xia@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robert Richter <rric@kernel.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@infradead.org
      Cc: adrian.hunter@intel.com
      Cc: kan.liang@intel.com
      Cc: markus.t.metzger@intel.com
      Cc: mathieu.poirier@linaro.org
      Link: http://lkml.kernel.org/r/1421237903-181015-3-git-send-email-alexander.shishkin@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      45bfb2e5
  11. 04 2月, 2015 1 次提交
  12. 11 12月, 2013 1 次提交
  13. 06 11月, 2013 6 次提交
    • P
      perf: Update a stale comment · 394570b7
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-9s5mze78gmlz19agt39i8rii@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      394570b7
    • P
      perf: Optimize perf_output_begin() -- address calculation · 524feca5
      Peter Zijlstra 提交于
      Rewrite the handle address calculation code to be clearer.
      
      Saves 8 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-3trb2n2henb9m27tncef3ag7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      524feca5
    • P
      perf: Optimize perf_output_begin() -- lost_event case · d20a973f
      Peter Zijlstra 提交于
      Avoid touching the lost_event and sample_data cachelines twince. Its
      not like we end up doing less work, but it might help to keep all
      accesses to these cachelines in one place.
      
      Due to code shuffle, this looses 4 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-zfxnc58qxj0eawdoj31hhupv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d20a973f
    • P
      perf: Optimize perf_output_begin() · 85f59edf
      Peter Zijlstra 提交于
      There's no point in re-doing the memory-barrier when we fail the
      cmpxchg(). Also placing it after the space reservation loop makes it
      clearer it only separates the userpage->tail read from the data
      stores.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-c19u6egfldyx86tpyc3zgkw9@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      85f59edf
    • P
      perf: Add unlikely() to the ring-buffer code · c72b42a3
      Peter Zijlstra 提交于
      Add unlikely() annotations to 'slow' paths:
      
      When having a sampling event but no output buffer; you have bigger
      issues -- also the bail is still faster than actually doing the work.
      
      When having a sampling event but a control page only buffer, you have
      bigger issues -- again the bail is still faster than actually doing
      work.
      
      Optimize for the case where you're not loosing events -- again, not
      doing the work is still faster but make sure that when you have to
      actually do work its as fast as possible.
      
      The typical watermark is 1/2 the buffer size, so most events will not
      take this path.
      
      Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-wlg3jew3qnutm8opd0hyeuwn@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c72b42a3
    • P
      perf: Simplify the ring-buffer code · 26c86da8
      Peter Zijlstra 提交于
      By using CIRC_SPACE() we can obviate the need for perf_output_space().
      
      Shrinks the size of perf_output_begin() by 17 bytes on
      x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-vtb0xb0llebmsdlfn1v5vtfj@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26c86da8
  14. 29 10月, 2013 1 次提交
  15. 01 5月, 2013 1 次提交
  16. 21 3月, 2013 1 次提交
    • S
      perf: Fix ring_buffer perf_output_space() boundary calculation · dd9c086d
      Stephane Eranian 提交于
      This patch fixes a flaw in perf_output_space(). In case the size
      of the space needed is bigger than the actual buffer size, there
      may be situations where the function would return true (i.e.,
      there is space) when it should not. head > offset due to
      rounding of the masking logic.
      
      The problem can be tested by activating BTS on Intel processors.
      A BTS record can be as big as 16 pages. The following command
      fails:
      
        $ perf record -m 4 -c 1 -e branches:u my_test_program
      
      You will get a buffer corruption with this. Perf report won't be
      able to parse the perf.data.
      
      The fix is to first check that the requested space is smaller
      than the buffer size. If so, then the masking logic will work
      fine. If not, then there is no chance the record can be saved
      and it will be gracefully handled by upper code layers.
      
      [ In v2, we also make the logic for the writable more explicit by
        renaming it to rb->overwrite because it tells whether or not the
        buffer can overwrite its tail (suggested by PeterZ). ]
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: fweisbec@gmail.com
      Link: http://lkml.kernel.org/r/20130318133327.GA3056@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd9c086d
  17. 10 8月, 2012 2 次提交
    • J
      perf: Add perf_output_skip function to skip bytes in sample · 5685e0ff
      Jiri Olsa 提交于
      Introducing perf_output_skip function to be able to skip data within the
      perf ring buffer.
      
      When writing data into perf ring buffer we first reserve needed place in
      ring buffer and then copy the actual data.
      
      There's a possibility we won't be able to fill all the reserved size
      with data, so we need a way to skip the remaining bytes.
      
      This is going to be useful when storing the user stack dump, where we
      might end up with less data than we originally requested.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5685e0ff
    • F
      perf: Factor __output_copy to be usable with specific copy function · 91d7753a
      Frederic Weisbecker 提交于
      Adding a generic way to use __output_copy function with specific copy
      function via DEFINE_PERF_OUTPUT_COPY macro.
      
      Using this to add new __output_copy_user function, that provides output
      copy from user pointers. For x86 the copy_from_user_nmi function is used
      and __copy_from_user_inatomic for the rest of the architectures.
      
      This new function will be used in user stack dump on sample, coming in
      next patches.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7753a
  18. 02 1月, 2012 1 次提交
  19. 05 12月, 2011 1 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
  20. 01 7月, 2011 3 次提交
    • P
      perf: Remove the perf_output_begin(.sample) argument · a7ac67ea
      Peter Zijlstra 提交于
      Since only samples call perf_output_sample() its much saner (and more
      correct) to put the sample logic in there than in the
      perf_output_begin()/perf_output_end() pair.
      
      Saves a useless argument, reduces conditionals and shrinks
      struct perf_output_handle, win!
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a7ac67ea
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
    • V
      perf_events: Fix perf buffer watermark setting · 4ec8363d
      Vince Weaver 提交于
      Since 2.6.36 (specifically commit d57e34fd ("perf: Simplify the
      ring-buffer logic: make perf_buffer_alloc() do everything needed"),
      the perf_buffer_init_code() has been mis-setting the buffer watermark
      if perf_event_attr.wakeup_events has a non-zero value.
      
      This is because perf_event_attr.wakeup_events is a union with
      perf_event_attr.wakeup_watermark.
      
      This commit re-enables the check for perf_event_attr.watermark being
      set before continuing with setting a non-default watermark.
      
      This bug is most noticable when you are trying to use PERF_IOC_REFRESH
      with a value larger than one and perf_event_attr.wakeup_events is set to
      one.  In this case the buffer watermark will be set to 1 and you will
      get extraneous POLL_IN overflows rather than POLL_HUP as expected.
      
      [ avoid using attr.wakeup_events when attr.watermark is set ]
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4ec8363d
  21. 09 6月, 2011 1 次提交