1. 06 11月, 2013 6 次提交
    • P
      perf: Update a stale comment · 394570b7
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-9s5mze78gmlz19agt39i8rii@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      394570b7
    • P
      perf: Optimize perf_output_begin() -- address calculation · 524feca5
      Peter Zijlstra 提交于
      Rewrite the handle address calculation code to be clearer.
      
      Saves 8 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-3trb2n2henb9m27tncef3ag7@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      524feca5
    • P
      perf: Optimize perf_output_begin() -- lost_event case · d20a973f
      Peter Zijlstra 提交于
      Avoid touching the lost_event and sample_data cachelines twince. Its
      not like we end up doing less work, but it might help to keep all
      accesses to these cachelines in one place.
      
      Due to code shuffle, this looses 4 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-zfxnc58qxj0eawdoj31hhupv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d20a973f
    • P
      perf: Optimize perf_output_begin() · 85f59edf
      Peter Zijlstra 提交于
      There's no point in re-doing the memory-barrier when we fail the
      cmpxchg(). Also placing it after the space reservation loop makes it
      clearer it only separates the userpage->tail read from the data
      stores.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-c19u6egfldyx86tpyc3zgkw9@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      85f59edf
    • P
      perf: Add unlikely() to the ring-buffer code · c72b42a3
      Peter Zijlstra 提交于
      Add unlikely() annotations to 'slow' paths:
      
      When having a sampling event but no output buffer; you have bigger
      issues -- also the bail is still faster than actually doing the work.
      
      When having a sampling event but a control page only buffer, you have
      bigger issues -- again the bail is still faster than actually doing
      work.
      
      Optimize for the case where you're not loosing events -- again, not
      doing the work is still faster but make sure that when you have to
      actually do work its as fast as possible.
      
      The typical watermark is 1/2 the buffer size, so most events will not
      take this path.
      
      Shrinks perf_output_begin() by 16 bytes on x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-wlg3jew3qnutm8opd0hyeuwn@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c72b42a3
    • P
      perf: Simplify the ring-buffer code · 26c86da8
      Peter Zijlstra 提交于
      By using CIRC_SPACE() we can obviate the need for perf_output_space().
      
      Shrinks the size of perf_output_begin() by 17 bytes on
      x86_64-defconfig.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: james.hogan@imgtec.com
      Cc: Vince Weaver <vince@deater.net>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Anton Blanchard <anton@samba.org>
      Link: http://lkml.kernel.org/n/tip-vtb0xb0llebmsdlfn1v5vtfj@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26c86da8
  2. 29 10月, 2013 1 次提交
  3. 01 5月, 2013 1 次提交
  4. 21 3月, 2013 1 次提交
    • S
      perf: Fix ring_buffer perf_output_space() boundary calculation · dd9c086d
      Stephane Eranian 提交于
      This patch fixes a flaw in perf_output_space(). In case the size
      of the space needed is bigger than the actual buffer size, there
      may be situations where the function would return true (i.e.,
      there is space) when it should not. head > offset due to
      rounding of the masking logic.
      
      The problem can be tested by activating BTS on Intel processors.
      A BTS record can be as big as 16 pages. The following command
      fails:
      
        $ perf record -m 4 -c 1 -e branches:u my_test_program
      
      You will get a buffer corruption with this. Perf report won't be
      able to parse the perf.data.
      
      The fix is to first check that the requested space is smaller
      than the buffer size. If so, then the masking logic will work
      fine. If not, then there is no chance the record can be saved
      and it will be gracefully handled by upper code layers.
      
      [ In v2, we also make the logic for the writable more explicit by
        renaming it to rb->overwrite because it tells whether or not the
        buffer can overwrite its tail (suggested by PeterZ). ]
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: peterz@infradead.org
      Cc: jolsa@redhat.com
      Cc: fweisbec@gmail.com
      Link: http://lkml.kernel.org/r/20130318133327.GA3056@quadSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dd9c086d
  5. 10 8月, 2012 2 次提交
    • J
      perf: Add perf_output_skip function to skip bytes in sample · 5685e0ff
      Jiri Olsa 提交于
      Introducing perf_output_skip function to be able to skip data within the
      perf ring buffer.
      
      When writing data into perf ring buffer we first reserve needed place in
      ring buffer and then copy the actual data.
      
      There's a possibility we won't be able to fill all the reserved size
      with data, so we need a way to skip the remaining bytes.
      
      This is going to be useful when storing the user stack dump, where we
      might end up with less data than we originally requested.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-5-git-send-email-jolsa@redhat.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5685e0ff
    • F
      perf: Factor __output_copy to be usable with specific copy function · 91d7753a
      Frederic Weisbecker 提交于
      Adding a generic way to use __output_copy function with specific copy
      function via DEFINE_PERF_OUTPUT_COPY macro.
      
      Using this to add new __output_copy_user function, that provides output
      copy from user pointers. For x86 the copy_from_user_nmi function is used
      and __copy_from_user_inatomic for the rest of the architectures.
      
      This new function will be used in user stack dump on sample, coming in
      next patches.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Arun Sharma <asharma@fb.com>
      Cc: Benjamin Redelings <benjamin.redelings@nescent.org>
      Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Frank Ch. Eigler <fche@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Ulrich Drepper <drepper@gmail.com>
      Link: http://lkml.kernel.org/r/1344345647-11536-4-git-send-email-jolsa@redhat.comSigned-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      91d7753a
  6. 02 1月, 2012 1 次提交
  7. 05 12月, 2011 1 次提交
    • P
      perf: Fix loss of notification with multi-event · 10c6db11
      Peter Zijlstra 提交于
      When you do:
              $ perf record -e cycles,cycles,cycles noploop 10
      
      You expect about 10,000 samples for each event, i.e., 10s at
      1000samples/sec. However, this is not what's happening. You
      get much fewer samples, maybe 3700 samples/event:
      
      $ perf report -D | tail -15
      Aggregated stats:
                 TOTAL events:      10998
                  MMAP events:         66
                  COMM events:          2
                SAMPLE events:      10930
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      cycles stats:
                 TOTAL events:       3642
                SAMPLE events:       3642
      cycles stats:
                 TOTAL events:       3644
                SAMPLE events:       3644
      
      On a Intel Nehalem or even AMD64, there are 4 counters capable
      of measuring cycles, so there is plenty of space to measure those
      events without multiplexing (even with the NMI watchdog active).
      And even with multiplexing, we'd expect roughly the same number
      of samples per event.
      
      The root of the problem was that when the event that caused the buffer
      to become full was not the first event passed on the cmdline, the user
      notification would get lost. The notification was sent to the file
      descriptor of the overflowed event but the perf tool was not polling
      on it.  The perf tool aggregates all samples into a single buffer,
      i.e., the buffer of the first event. Consequently, it assumes
      notifications for any event will come via that descriptor.
      
      The seemingly straight forward solution of moving the waitq into the
      ringbuffer object doesn't work because of life-time issues. One could
      perf_event_set_output() on a fd that you're also blocking on and cause
      the old rb object to be freed while its waitq would still be
      referenced by the blocked thread -> FAIL.
      
      Therefore link all events to the ringbuffer and broadcast the wakeup
      from the ringbuffer object to all possible events that could be waited
      upon. This is rather ugly, and we're open to better solutions but it
      works for now.
      Reported-by: NStephane Eranian <eranian@google.com>
      Finished-by: NStephane Eranian <eranian@google.com>
      Reviewed-by: NStephane Eranian <eranian@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20111126014731.GA7030@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      10c6db11
  8. 01 7月, 2011 3 次提交
    • P
      perf: Remove the perf_output_begin(.sample) argument · a7ac67ea
      Peter Zijlstra 提交于
      Since only samples call perf_output_sample() its much saner (and more
      correct) to put the sample logic in there than in the
      perf_output_begin()/perf_output_end() pair.
      
      Saves a useless argument, reduces conditionals and shrinks
      struct perf_output_handle, win!
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/n/tip-2crpvsx3cqu67q3zqjbnlpsc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a7ac67ea
    • P
      perf: Remove the nmi parameter from the swevent and overflow interface · a8b0ca17
      Peter Zijlstra 提交于
      The nmi parameter indicated if we could do wakeups from the current
      context, if not, we would set some state and self-IPI and let the
      resulting interrupt do the wakeup.
      
      For the various event classes:
      
        - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
          the PMI-tail (ARM etc.)
        - tracepoint: nmi=0; since tracepoint could be from NMI context.
        - software: nmi=[0,1]; some, like the schedule thing cannot
          perform wakeups, and hence need 0.
      
      As one can see, there is very little nmi=1 usage, and the down-side of
      not using it is that on some platforms some software events can have a
      jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).
      
      The up-side however is that we can remove the nmi parameter and save a
      bunch of conditionals in fast paths.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Michael Cree <mcree@orcon.net.nz>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Eric B Munson <emunson@mgebm.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      a8b0ca17
    • V
      perf_events: Fix perf buffer watermark setting · 4ec8363d
      Vince Weaver 提交于
      Since 2.6.36 (specifically commit d57e34fd ("perf: Simplify the
      ring-buffer logic: make perf_buffer_alloc() do everything needed"),
      the perf_buffer_init_code() has been mis-setting the buffer watermark
      if perf_event_attr.wakeup_events has a non-zero value.
      
      This is because perf_event_attr.wakeup_events is a union with
      perf_event_attr.wakeup_watermark.
      
      This commit re-enables the check for perf_event_attr.watermark being
      set before continuing with setting a non-default watermark.
      
      This bug is most noticable when you are trying to use PERF_IOC_REFRESH
      with a value larger than one and perf_event_attr.wakeup_events is set to
      one.  In this case the buffer watermark will be set to 1 and you will
      get extraneous POLL_IN overflows rather than POLL_HUP as expected.
      
      [ avoid using attr.wakeup_events when attr.watermark is set ]
      Signed-off-by: NVince Weaver <vweaver1@eecs.utk.edu>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.00.1106011506390.5384@cl320.eecs.utk.eduSigned-off-by: NIngo Molnar <mingo@elte.hu>
      4ec8363d
  9. 09 6月, 2011 1 次提交