1. 08 2月, 2011 1 次提交
    • I
      tracing/syscalls: Don't add events for unmapped syscalls · ba976970
      Ian Munsie 提交于
      FTRACE_SYSCALLS would create events for each and every system call, even
      if it had failed to map the system call's name with it's number. This
      resulted in a number of events being created that would not behave as
      expected.
      
      This could happen, for example, on architectures who's symbol names are
      unusual and will not match the system call name. It could also happen
      with system calls which were mapped to sys_ni_syscall.
      
      This patch changes the default system call number in the metadata to -1.
      If the system call name from the metadata is not successfully mapped to
      a system call number during boot, than the event initialisation routine
      will now return an error, preventing the event from being created.
      Signed-off-by: NIan Munsie <imunsie@au1.ibm.com>
      LKML-Reference: <1296703645-18718-2-git-send-email-imunsie@au1.ibm.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ba976970
  2. 03 2月, 2011 2 次提交
    • S
      tracing: Replace syscall_meta_data struct array with pointer array · 3d56e331
      Steven Rostedt 提交于
      Currently the syscall_meta structures for the syscall tracepoints are
      placed in the __syscall_metadata section, and at link time, the linker
      makes one large array of all these syscall metadata structures. On boot
      up, this array is read (much like the initcall sections) and the syscall
      data is processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the __syscall_metadata section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The __syscall_metadata section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3d56e331
    • S
      tracing: Replace trace_event struct array with pointer array · e4a9ea5e
      Steven Rostedt 提交于
      Currently the trace_event structures are placed in the _ftrace_events
      section, and at link time, the linker makes one large array of all
      the trace_event structures. On boot up, this array is read (much like
      the initcall sections) and the events are processed.
      
      The problem is that there is no guarantee that gcc will place complex
      structures nicely together in an array format. Two structures in the
      same file may be placed awkwardly, because gcc has no clue that they
      are suppose to be in an array.
      
      A hack was used previous to force the alignment to 4, to pack the
      structures together. But this caused alignment issues with other
      architectures (sparc).
      
      Instead of packing the structures into an array, the structures' addresses
      are now put into the _ftrace_event section. As pointers are always the
      natural alignment, gcc should always pack them tightly together
      (otherwise initcall, extable, etc would also fail).
      
      By having the pointers to the structures in the section, we can still
      iterate the trace_events without causing unnecessary alignment problems
      with other architectures, or depending on the current behaviour of
      gcc that will likely change in the future just to tick us kernel developers
      off a little more.
      
      The _ftrace_event section is also moved into the .init.data section
      as it is now only needed at boot up.
      Suggested-by: NDavid Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e4a9ea5e
  3. 20 1月, 2011 1 次提交
    • T
      lockdep: Move early boot local IRQ enable/disable status to init/main.c · 2ce802f6
      Tejun Heo 提交于
      During early boot, local IRQ is disabled until IRQ subsystem is
      properly initialized.  During this time, no one should enable
      local IRQ and some operations which usually are not allowed with
      IRQ disabled, e.g. operations which might sleep or require
      communications with other processors, are allowed.
      
      lockdep tracked this with early_boot_irqs_off/on() callbacks.
      As other subsystems need this information too, move it to
      init/main.c and make it generally available.  While at it,
      toggle the boolean to early_boot_irqs_disabled instead of
      enabled so that it can be initialized with %false and %true
      indicates the exceptional condition.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110120110635.GB6036@htj.dyndns.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2ce802f6
  4. 15 1月, 2011 1 次提交
    • L
      tracing: Remove syscall_exit_fields · 7f85803a
      Lai Jiangshan 提交于
      There is no need for syscall_exit_fields as the syscall
      exit event class can already host the fields in its structure,
      like most other trace events do by default. Use that
      default behavior instead.
      
      Following this scheme, we don't need anymore to override the
      get_fields() callback of the syscall exit event class either.
      
      Hence both syscall_exit_fields and syscall_get_exit_fields() can
      be removed.
      
      Also changed some indentation to keep the following under 80
      characters:
      
      ".fields		= LIST_HEAD_INIT(event_class_syscall_exit.fields),"
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D301C0E.8090408@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7f85803a
  5. 10 1月, 2011 1 次提交
  6. 08 1月, 2011 2 次提交
  7. 07 1月, 2011 2 次提交
  8. 04 1月, 2011 2 次提交
  9. 24 12月, 2010 1 次提交
    • D
      ring_buffer: Off-by-one and duplicate events in ring_buffer_read_page · e1e35927
      David Sharp 提交于
      Fix two related problems in the event-copying loop of
      ring_buffer_read_page.
      
      The loop condition for copying events is off-by-one.
      "len" is the remaining space in the caller-supplied page.
      "size" is the size of the next event (or two events).
      If len == size, then there is just enough space for the next event.
      
      size was set to rb_event_ts_length, which may include the size of two
      events if the first event is a time-extend, in order to assure time-
      extends are kept together with the event after it. However,
      rb_advance_reader always advances by one event. This would result in the
      event after any time-extend being duplicated. Instead, get the size of
      a single event for the memcpy, but use rb_event_ts_length for the loop
      condition.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1293064704-8101-1-git-send-email-dhsharp@google.com>
      LKML-Reference: <AANLkTin7nLrRPc9qGjdjHbeVDDWiJjAiYyb-L=gH85bx@mail.gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e1e35927
  10. 01 12月, 2010 1 次提交
  11. 19 11月, 2010 1 次提交
    • S
      tracing/events: Show real number in array fields · 04295780
      Steven Rostedt 提交于
      Currently we have in something like the sched_switch event:
      
        field:char prev_comm[TASK_COMM_LEN];	offset:12;	size:16;	signed:1;
      
      When a userspace tool such as perf tries to parse this, the
      TASK_COMM_LEN is meaningless. This is done because the TRACE_EVENT() macro
      simply uses a #len to show the string of the length. When the length is
      an enum, we get a string that means nothing for tools.
      
      By adding a static buffer and a mutex to protect it, we can store the
      string into that buffer with snprintf and show the actual number.
      Now we get:
      
        field:char prev_comm[16];       offset:12;      size:16;        signed:1;
      
      Something much more useful.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      04295780
  12. 18 11月, 2010 2 次提交
  13. 16 11月, 2010 1 次提交
  14. 13 11月, 2010 1 次提交
    • S
      tracing: Fix recursive user stack trace · 91e86e56
      Steven Rostedt 提交于
      The user stack trace can fault when examining the trace. Which
      would call the do_page_fault handler, which would trace again,
      which would do the user stack trace, which would fault and call
      do_page_fault again ...
      
      Thus this is causing a recursive bug. We need to have a recursion
      detector here.
      
      [ Resubmitted by Jiri Olsa ]
      
      [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      91e86e56
  15. 10 11月, 2010 2 次提交
    • C
      block: remove REQ_HARDBARRIER · 02e031cb
      Christoph Hellwig 提交于
      REQ_HARDBARRIER is dead now, so remove the leftovers.  What's left
      at this point is:
      
       - various checks inside the block layer.
       - sanity checks in bio based drivers.
       - now unused bio_empty_barrier helper.
       - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
         but Xen really needs to sort out it's barrier situaton.
       - setting of ordered tags in uas - dead code copied from old scsi
         drivers.
       - scsi different retry for barriers - it's dead and should have been
         removed when flushes were converted to FS requests.
       - blktrace handling of barriers - removed.  Someone who knows blktrace
         better should add support for REQ_FLUSH and REQ_FUA, though.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      02e031cb
    • H
      [S390] ftrace: build without frame pointers on s390 · becf91f1
      Heiko Carstens 提交于
      s390 doesn't need FRAME_POINTERS in order to have a working function tracer.
      We don't need frame pointers in order to get strack traces since we always
      have valid backchains by using the -mkernel-backchain gcc option.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      becf91f1
  16. 02 11月, 2010 1 次提交
  17. 28 10月, 2010 1 次提交
  18. 24 10月, 2010 1 次提交
  19. 23 10月, 2010 2 次提交
  20. 21 10月, 2010 6 次提交
    • S
      tracing: Do not limit the size of the number of CPU buffers · dd49a38c
      Steven Rostedt 提交于
      The tracing per_cpu buffers were limited to 999 CPUs for a mear
      savings in stack space of a char array. Up the array to 30 characters
      which is more than enough to hold a 64 bit number.
      Reported-by: NRobin Holt <holt@sgi.com>
      Suggested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      dd49a38c
    • S
      ring-buffer: Remove unused macro RB_TIMESTAMPS_PER_PAGE · b8b2663b
      Steven Rostedt 提交于
      With the binding of time extends to events we no longer need to use
      the macro RB_TIMESTAMPS_PER_PAGE. Remove it.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b8b2663b
    • S
      ring-buffer: Micro-optimize with some strategic inlining · d9abde21
      Steven Rostedt 提交于
      By using inline and noinline, we are able to make the fast path of
      recording an event 4% faster.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d9abde21
    • S
      ring-buffer: Remove condition to add timestamp in fast path · 140ff891
      Steven Rostedt 提交于
      There's a condition to check if we should add a time extend or
      not in the fast path. But this condition is racey (in the sense
      that we can add a unnecessary time extend, but nothing that
      can break anything). We later check if the time or event time
      delta should be zero or have real data in it (not racey), making
      this first check redundant.
      
      This check may help save space once in a while, but really is
      not worth the hassle to try to save some space that happens at
      most 134 ms at a time.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      140ff891
    • S
      ring-buffer: Bind time extend and data events together · 69d1b839
      Steven Rostedt 提交于
      When the time between two timestamps is greater than
      2^27 nanosecs (~134 ms) a time extend event is added that extends
      the time difference to 59 bits (~18 years). This is due to
      events only having a 27 bit field to store time.
      
      Currently this time extend is a separate event. We add it just before
      the event data that is being written to the buffer. But before
      the event data is committed, the event data can also be discarded (as
      with the case of filters). But because the time extend has already been
      committed, it will stay in the buffer.
      
      If lots of events are being filtered and no event is being
      written, then every 134ms a time extend can be added to the buffer
      without any data attached. To keep from filling the entire buffer
      with time extends, a time extend will never be the first event
      in a page because the page timestamp can be used. Time extends can
      only fill the rest of a page with some data at the beginning.
      
      This patch binds the time extend with the data. The difference here
      is that the time extend is not committed before the data is added.
      Instead, when a time extend is needed, the space reserved on
      the ring buffer is the time extend + the data event size. The
      time extend is added to the first part of the reserved block and
      the data is added to the second. The time extend event is passed
      back to the reserver, but since the reserver also uses a function
      to find the data portion of the reserved block, no changes to the
      ring buffer interface need to be made.
      
      When a commit is discarded, we now remove both the time extend and
      the event. With this approach no more than one time extend can
      be in the buffer in a row. Data must always follow a time extend.
      
      Thanks to Mathieu Desnoyers for suggesting this idea.
      Suggested-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      69d1b839
    • S
      ring-buffer: Pass delta by value and not by reference · f25106ae
      Steven Rostedt 提交于
      The delta between events is passed to the timestamp code by reference
      and the timestamp code will reset the value. But it can be reset
      from the caller. No need to pass it in by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the delta in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f25106ae
  21. 20 10月, 2010 2 次提交
    • S
      ring-buffer: Pass timestamp by value and not by reference · e8bc43e8
      Steven Rostedt 提交于
      The original code for the ring buffer had locations that modified
      the timestamp and that change was used by the callers. Now,
      the timestamp is not reused by the callers and there is no reason
      to pass it by reference.
      
      By changing the call to pass by value, lets gcc optimize the code
      a bit more where it can store the timestamp in a register and not
      worry about updating the reference.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e8bc43e8
    • S
      ring-buffer: Make write slow path out of line · 747e94ae
      Steven Rostedt 提交于
      Gcc inlines the slow path of the ring buffer write which can
      hurt performance. This patch simply forces the slow path function
      rb_move_tail() to always be a function.
      
      The ring_buffer_benchmark module with reader_disabled=1 shows that
      this patch changes the time to record an event from 135 ns to
      132 ns. (3 ns or 2.22% improvement)
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      747e94ae
  22. 19 10月, 2010 2 次提交
  23. 18 10月, 2010 4 次提交
    • S
      tracing: Remove parent recording in latency tracer graph options · 78c89ba1
      Steven Rostedt 提交于
      Even though the parent is recorded with the normal function tracing
      of the latency tracers (irqsoff and wakeup), the function graph
      recording is bogus.
      
      This is due to the function graph messing with the return stack.
      The latency tracers pass in as the parent CALLER_ADDR0, which
      works fine for plain function tracing. But this causes bogus output
      with the graph tracer:
      
       3)    <idle>-0    |  d.s3.  0.000 us    |  return_to_handler();
       3)    <idle>-0    |  d.s3.  0.000 us    |  _raw_spin_unlock_irqrestore();
       3)    <idle>-0    |  d.s3.  0.000 us    |  return_to_handler();
       3)    <idle>-0    |  d.s3.  0.000 us    |  trace_hardirqs_on();
      
      The "return_to_handle()" call is the trampoline of the
      function graph tracer, and is meaningless in this context.
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      78c89ba1
    • S
      tracing: Use one prologue for the preempt irqs off tracer function tracers · 5e6d2b9c
      Steven Rostedt 提交于
      The preempt and irqsoff tracers have three types of function tracers.
      Normal function tracer, function graph entry, and function graph return.
      Each of these use a complex dance to prevent recursion and whether
      to trace the data or not (depending if interrupts are enabled or not).
      
      This patch moves the duplicate code into a single routine, to
      prevent future mistakes with modifying duplicate complex code.
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5e6d2b9c
    • S
      tracing: Use one prologue for the wakeup tracer function tracers · 542181d3
      Steven Rostedt 提交于
      The wakeup tracer has three types of function tracers. Normal
      function tracer, function graph entry, and function graph return.
      Each of these use a complex dance to prevent recursion and whether
      to trace the data or not (depending on the wake_task variable).
      
      This patch moves the duplicate code into a single routine, to
      prevent future mistakes with modifying duplicate complex code.
      
      Cc: Jiri Olsa <jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      542181d3
    • J
      tracing: Graph support for wakeup tracer · 7495a5be
      Jiri Olsa 提交于
      Add function graph support for wakeup latency tracer.
      The graph output is enabled by setting the 'display-graph'
      trace option.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1285243253-7372-4-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7495a5be