提交 · a6f0eb6adc42e5eed3f35af99c61c0e411b16f8e · openanolis / cloud-kernel

12 11月, 2009 1 次提交

ring-buffer: Add multiple iterations between benchmark timestamps · a6f0eb6a

由 Steven Rostedt 提交于 11月 11, 2009

The ring_buffer_benchmark does a gettimeofday after every write to the
ring buffer in its measurements. This adds the overhead of the call
to gettimeofday to the measurements and does not give an accurate picture
of the length of time it takes to record a trace.

This was first noticed with perf top:

------------------------------------------------------------------------------
   PerfTop:     679 irqs/sec  kernel:99.9% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             1673.00 - 27.8% : trace_clock_local
              806.00 - 13.4% : do_gettimeofday
              590.00 -  9.8% : rb_reserve_next_event
              554.00 -  9.2% : native_read_tsc
              431.00 -  7.2% : ring_buffer_lock_reserve
              365.00 -  6.1% : __rb_reserve_next
              355.00 -  5.9% : rb_end_commit
              322.00 -  5.4% : getnstimeofday
              268.00 -  4.5% : ring_buffer_unlock_commit
              262.00 -  4.4% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              113.00 -  1.9% : read_tsc
               91.00 -  1.5% : debug_smp_processor_id
               69.00 -  1.1% : trace_recursive_unlock
               66.00 -  1.1% : ring_buffer_event_data
               25.00 -  0.4% : _spin_unlock_irq

And the length of each write to the ring buffer measured at 310ns.

This patch adds a new module parameter called "write_interval" which is
defaulted to 50. This is the number of writes performed between
timestamps. After this patch perf top shows:

------------------------------------------------------------------------------
   PerfTop:     244 irqs/sec  kernel:100.0% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

             2842.00 - 40.4% : trace_clock_local
             1043.00 - 14.8% : rb_reserve_next_event
              784.00 - 11.1% : ring_buffer_lock_reserve
              600.00 -  8.5% : __rb_reserve_next
              579.00 -  8.2% : rb_end_commit
              440.00 -  6.3% : ring_buffer_unlock_commit
              290.00 -  4.1% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
              155.00 -  2.2% : debug_smp_processor_id
              117.00 -  1.7% : trace_recursive_unlock
              103.00 -  1.5% : ring_buffer_event_data
               28.00 -  0.4% : do_gettimeofday
               22.00 -  0.3% : _spin_unlock_irq
               14.00 -  0.2% : native_read_tsc
               11.00 -  0.2% : getnstimeofday

do_gettimeofday dropped from 13% usage to a mere 0.4%! (using the default
50 interval)  The measurement for each timestamp went from 310ns to 210ns.
That's 100ns (1/3rd) overhead that the gettimeofday call was introducing.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

a6f0eb6a

18 6月, 2009 1 次提交

ring-buffer: have benchmark test print to trace buffer · 4b221f03

由 Steven Rostedt 提交于 6月 17, 2009

Currently the output of the ring buffer benchmark/test prints to
the console. This test runs for ten seconds every ten seconds and
ouputs the result after every iteration. This needlessly fills up
the logs.

This patch makes the ring buffer benchmark/test print to the ftrace
buffer using trace_printk. To view the test results, you must examine
the debug/tracing/trace file.
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

4b221f03

17 6月, 2009 1 次提交

ring-buffer: have benchmark test handle discarded events · 9086c7b9

由 Steven Rostedt 提交于 6月 16, 2009

With the addition of commit:

  c7b09308
  ring-buffer: prevent adding write in discarded area

The ring buffer may now add discarded events when a write passes
the end of a buffer page. Before, a discarded event was only added
when the tracer deliberately created one. The ring buffer benchmark
test does not handle discarded events when it reads the buffer and
fails when it encounters one.

Also fix the increment for large data entries (luckily, the test did
not add any yet).

[ Impact: fix false failure of ring buffer self test ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9086c7b9

12 5月, 2009 2 次提交

ring-buffer: check for divide by zero in ring-buffer-benchmark · d988ff94

由 Steven Rostedt 提交于 5月 08, 2009

Although we check if "missed" is not zero, we divide by hit + missed,
and the addition can possible overflow and become a divide by zero.

This patch checks for this case, and will report it when it happens
then modify "hit" to make the calculation be non zero.

[ Impact: prevent possible divide by zero in ring-buffer-benchmark ]
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

d988ff94

ring-buffer: replace constants with time macros in ring-buffer-benchmark · 5a772b2b

由 Steven Rostedt 提交于 5月 08, 2009

The use of numeric constants is discouraged. It is cleaner and more
descriptive to use macros for constant time conversions.

This patch also removes an extra new line.

[ Impact: more descriptive time conversions ]
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5a772b2b

08 5月, 2009 2 次提交

ring-buffer: add total count in ring-buffer-benchmark · 7da3046d

由 Steven Rostedt 提交于 5月 07, 2009

It is nice to see the overhead of the benchmark test when tracing is
disabled. That is, we turn off the ring buffer just to see what the
cost of running the loop that calls into the ring buffer is.

Currently, if no entries wer made, we get 0. This is not informative.
This patch changes it to check if we had any "missed" (non recorded)
events. If so, a total count is also reported.

[ Impact: evaluate the over head of the ring buffer benchmark test ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

7da3046d

ring-buffer: only periodically call cond_resched to ring-buffer-benchmark · 0574ea42

由 Steven Rostedt 提交于 5月 07, 2009

Calling cond_resched at every iteration of the loop adds a bit of
overhead to the benchmark.

This patch does two things.

1) only calls cond-resched when CONFIG_PREEMPT is not enabled
2) only calls cond-resched after so many traces has been performed.

[ Impact: less overhead to the ring-buffer-benchmark ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0574ea42

07 5月, 2009 3 次提交

ring-buffer: remove complex calculations in ring-buffer-test · 29c8000e

由 Steven Rostedt 提交于 5月 07, 2009

Ingo Molnar thought that the code to calculate the time in cond_resched
is a bit too ugly and is not needed. This patch removes it and replaces
it with a simple call to cond_resched. I kept the comment that explains
the reason for the cond_resched.

[ Impact: remove ugly code ]
Reported-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

29c8000e

ring-buffer: change test to be more latency friendly · 3e07a4f6

由 Steven Rostedt 提交于 5月 06, 2009

The ring buffer benchmark/test runs a producer for 10 seconds.
This is done with preemption and interrupts enabled. But if the kernel
is not compiled with CONFIG_PREEMPT, it basically stops everything
but interrupts for 10 seconds.

Although this is just a test and is not for production, this attribute
can be quite annoying. It can also spawn badness elsewhere.

This patch solves the issues by calling "cond_resched" when the system
is not compiled with CONFIG_PREEMPT. It also keeps track of the time
spent to call cond_resched such that it does not go against the
time calculations. That is, if the task schedules away, the time scheduled
out is removed from the test data. Note, this only works for non PREEMPT
because we do not know when the task is scheduled out if we have PREEMPT
enabled.

[ Impact: prevent test from stopping the world for 10 seconds ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3e07a4f6

ring-buffer: check for failed allocation in ring buffer benchmark · 00c81a58

由 Steven Rostedt 提交于 5月 06, 2009

The result of the allocation of the ring buffer read page in the
ring buffer bench mark does not check the return to see if a page
was actually allocated. This patch fixes that.

[ Impact: avoid NULL dereference ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

00c81a58

06 5月, 2009 1 次提交

ring-buffer: add benchmark and tester · 5092dbc9

由 Steven Rostedt 提交于 5月 05, 2009

This patch adds code that can benchmark the ring buffer as well as
test it. This code can be compiled into the kernel (not recommended)
or as a module.

A separate ring buffer is used to not interfer with other users, like
ftrace. It creates a producer and a consumer (option to disable creation
of the consumer) and will run for 10 seconds, then sleep for 10 seconds
and then repeat.

While running, the producer will write 10 byte loads into the ring
buffer with just putting in the current CPU number. The reader will
continually try to read the buffer. The reader will alternate from reading
the buffer via event by event, or by full pages.

The output is a pr_info, thus it will fill up the syslogs.

  Starting ring buffer hammer
  End ring buffer hammer
  Time:     9000349 (usecs)
  Overruns: 12578640
  Read:     5358440  (by events)
  Entries:  0
  Total:    17937080
  Missed:   0
  Hit:      17937080
  Entries per millisec: 1993
  501 ns per entry
  Sleeping for 10 secs
  Starting ring buffer hammer
  End ring buffer hammer
  Time:     9936350 (usecs)
  Overruns: 0
  Read:     28146644  (by pages)
  Entries:  74
  Total:    28146718
  Missed:   0
  Hit:      28146718
  Entries per millisec: 2832
  353 ns per entry
  Sleeping for 10 secs

Time:      is the time the test ran
Overruns:  the number of events that were overwritten and not read
Read:      the number of events read (either by pages or events)
Entries:   the number of entries left in the buffer
                 (the by pages will only read full pages)
Total:     Entries + Read + Overruns
Missed:    the number of entries that failed to write
Hit:       the number of entries that were written

The above example shows that it takes ~353 nanosecs per entry when
there is a reader, reading by pages (and no overruns)

The event by event reader slowed the producer down to 501 nanosecs.

[ Impact: see how changes to the ring buffer affect stability and performance ]
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

5092dbc9

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功