1. 27 2月, 2014 1 次提交
  2. 15 6月, 2011 1 次提交
    • V
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ea59064
  3. 28 4月, 2010 1 次提交
    • S
      ring-buffer: Make benchmark handle missed events · a838b2e6
      Steven Rostedt 提交于
      With the addition of the "missed events" flags that is stored in the
      commit field of the ring buffer page, the ring_buffer_benchmark
      was not updated to handle this. If events are missed, then the
      missed events flag is set in the ring buffer page, the benchmark
      will count that flag as part of the size of the page and will hit the BUG()
      when it tries to read beyond the page.
      
      The solution is simply to have the ring buffer benchmark mask off
      the extra bits.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a838b2e6
  4. 01 4月, 2010 1 次提交
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
  5. 05 1月, 2010 1 次提交
  6. 26 11月, 2009 1 次提交
    • S
      ring-buffer-benchmark: Add parameters to set produce/consumer priorities · 7ac07434
      Steven Rostedt 提交于
      Running the ring-buffer-benchmark's threads at the lowest priority may
      work well for keeping it in the background, but it is not appropriate
      for the benchmarks.
      
      This patch adds 4 parameters to the module:
      
        consumer_fifo
        consumer_nice
        producer_fifo
        producer_nice
      
      By default the consumer and producer still run at nice +19.
      
      If the *_fifo options are set, they will override the *_nice values.
      
       modprobe ring_buffer_benchmark consumer_nice=0 producer_fifo=10
      
      The above will set the consumer thread to a nice value of 0, and
      the producer thread to a RT SCHED_FIFO priority of 10.
      
      Note, this patch also fixes a bug where calling set_user_nice on the
      consumer thread would oops the kernel when the parameter "disable_reader"
      is set.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ac07434
  7. 23 11月, 2009 1 次提交
    • I
      ring-buffer benchmark: Run producer/consumer threads at nice +19 · 98e4833b
      Ingo Molnar 提交于
      The ring-buffer benchmark threads run on nice 0 by default, using
      up a lot of CPU time and slowing down the system:
      
         PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
        1024 root      20   0     0    0    0 D 95.3  0.0   4:01.67 rb_producer
        1023 root      20   0     0    0    0 R 93.5  0.0   2:54.33 rb_consumer
       21569 mingo     40   0 14852 1048  772 R  3.6  0.1   0:00.05 top
           1 root      40   0  4080  928  668 S  0.0  0.0   0:23.98 init
      
      Renice them to +19 to make them less intrusive.
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      98e4833b
  8. 12 11月, 2009 1 次提交
    • S
      ring-buffer: Add multiple iterations between benchmark timestamps · a6f0eb6a
      Steven Rostedt 提交于
      The ring_buffer_benchmark does a gettimeofday after every write to the
      ring buffer in its measurements. This adds the overhead of the call
      to gettimeofday to the measurements and does not give an accurate picture
      of the length of time it takes to record a trace.
      
      This was first noticed with perf top:
      
      ------------------------------------------------------------------------------
         PerfTop:     679 irqs/sec  kernel:99.9% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
                   samples    pcnt   kernel function
                   _______   _____   _______________
      
                   1673.00 - 27.8% : trace_clock_local
                    806.00 - 13.4% : do_gettimeofday
                    590.00 -  9.8% : rb_reserve_next_event
                    554.00 -  9.2% : native_read_tsc
                    431.00 -  7.2% : ring_buffer_lock_reserve
                    365.00 -  6.1% : __rb_reserve_next
                    355.00 -  5.9% : rb_end_commit
                    322.00 -  5.4% : getnstimeofday
                    268.00 -  4.5% : ring_buffer_unlock_commit
                    262.00 -  4.4% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
                    113.00 -  1.9% : read_tsc
                     91.00 -  1.5% : debug_smp_processor_id
                     69.00 -  1.1% : trace_recursive_unlock
                     66.00 -  1.1% : ring_buffer_event_data
                     25.00 -  0.4% : _spin_unlock_irq
      
      And the length of each write to the ring buffer measured at 310ns.
      
      This patch adds a new module parameter called "write_interval" which is
      defaulted to 50. This is the number of writes performed between
      timestamps. After this patch perf top shows:
      
      ------------------------------------------------------------------------------
         PerfTop:     244 irqs/sec  kernel:100.0% [1000Hz cpu-clock-msecs],  (all, 4 CPUs)
      ------------------------------------------------------------------------------
      
                   samples    pcnt   kernel function
                   _______   _____   _______________
      
                   2842.00 - 40.4% : trace_clock_local
                   1043.00 - 14.8% : rb_reserve_next_event
                    784.00 - 11.1% : ring_buffer_lock_reserve
                    600.00 -  8.5% : __rb_reserve_next
                    579.00 -  8.2% : rb_end_commit
                    440.00 -  6.3% : ring_buffer_unlock_commit
                    290.00 -  4.1% : ring_buffer_producer_thread 	[ring_buffer_benchmark]
                    155.00 -  2.2% : debug_smp_processor_id
                    117.00 -  1.7% : trace_recursive_unlock
                    103.00 -  1.5% : ring_buffer_event_data
                     28.00 -  0.4% : do_gettimeofday
                     22.00 -  0.3% : _spin_unlock_irq
                     14.00 -  0.2% : native_read_tsc
                     11.00 -  0.2% : getnstimeofday
      
      do_gettimeofday dropped from 13% usage to a mere 0.4%! (using the default
      50 interval)  The measurement for each timestamp went from 310ns to 210ns.
      That's 100ns (1/3rd) overhead that the gettimeofday call was introducing.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a6f0eb6a
  9. 18 6月, 2009 1 次提交
    • S
      ring-buffer: have benchmark test print to trace buffer · 4b221f03
      Steven Rostedt 提交于
      Currently the output of the ring buffer benchmark/test prints to
      the console. This test runs for ten seconds every ten seconds and
      ouputs the result after every iteration. This needlessly fills up
      the logs.
      
      This patch makes the ring buffer benchmark/test print to the ftrace
      buffer using trace_printk. To view the test results, you must examine
      the debug/tracing/trace file.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4b221f03
  10. 17 6月, 2009 1 次提交
    • S
      ring-buffer: have benchmark test handle discarded events · 9086c7b9
      Steven Rostedt 提交于
      With the addition of commit:
      
        c7b09308
        ring-buffer: prevent adding write in discarded area
      
      The ring buffer may now add discarded events when a write passes
      the end of a buffer page. Before, a discarded event was only added
      when the tracer deliberately created one. The ring buffer benchmark
      test does not handle discarded events when it reads the buffer and
      fails when it encounters one.
      
      Also fix the increment for large data entries (luckily, the test did
      not add any yet).
      
      [ Impact: fix false failure of ring buffer self test ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9086c7b9
  11. 12 5月, 2009 2 次提交
  12. 08 5月, 2009 2 次提交
    • S
      ring-buffer: add total count in ring-buffer-benchmark · 7da3046d
      Steven Rostedt 提交于
      It is nice to see the overhead of the benchmark test when tracing is
      disabled. That is, we turn off the ring buffer just to see what the
      cost of running the loop that calls into the ring buffer is.
      
      Currently, if no entries wer made, we get 0. This is not informative.
      This patch changes it to check if we had any "missed" (non recorded)
      events. If so, a total count is also reported.
      
      [ Impact: evaluate the over head of the ring buffer benchmark test ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7da3046d
    • S
      ring-buffer: only periodically call cond_resched to ring-buffer-benchmark · 0574ea42
      Steven Rostedt 提交于
      Calling cond_resched at every iteration of the loop adds a bit of
      overhead to the benchmark.
      
      This patch does two things.
      
      1) only calls cond-resched when CONFIG_PREEMPT is not enabled
      2) only calls cond-resched after so many traces has been performed.
      
      [ Impact: less overhead to the ring-buffer-benchmark ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0574ea42
  13. 07 5月, 2009 3 次提交
    • S
      ring-buffer: remove complex calculations in ring-buffer-test · 29c8000e
      Steven Rostedt 提交于
      Ingo Molnar thought that the code to calculate the time in cond_resched
      is a bit too ugly and is not needed. This patch removes it and replaces
      it with a simple call to cond_resched. I kept the comment that explains
      the reason for the cond_resched.
      
      [ Impact: remove ugly code ]
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      29c8000e
    • S
      ring-buffer: change test to be more latency friendly · 3e07a4f6
      Steven Rostedt 提交于
      The ring buffer benchmark/test runs a producer for 10 seconds.
      This is done with preemption and interrupts enabled. But if the kernel
      is not compiled with CONFIG_PREEMPT, it basically stops everything
      but interrupts for 10 seconds.
      
      Although this is just a test and is not for production, this attribute
      can be quite annoying. It can also spawn badness elsewhere.
      
      This patch solves the issues by calling "cond_resched" when the system
      is not compiled with CONFIG_PREEMPT. It also keeps track of the time
      spent to call cond_resched such that it does not go against the
      time calculations. That is, if the task schedules away, the time scheduled
      out is removed from the test data. Note, this only works for non PREEMPT
      because we do not know when the task is scheduled out if we have PREEMPT
      enabled.
      
      [ Impact: prevent test from stopping the world for 10 seconds ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3e07a4f6
    • S
      ring-buffer: check for failed allocation in ring buffer benchmark · 00c81a58
      Steven Rostedt 提交于
      The result of the allocation of the ring buffer read page in the
      ring buffer bench mark does not check the return to see if a page
      was actually allocated. This patch fixes that.
      
      [ Impact: avoid NULL dereference ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      00c81a58
  14. 06 5月, 2009 1 次提交
    • S
      ring-buffer: add benchmark and tester · 5092dbc9
      Steven Rostedt 提交于
      This patch adds code that can benchmark the ring buffer as well as
      test it. This code can be compiled into the kernel (not recommended)
      or as a module.
      
      A separate ring buffer is used to not interfer with other users, like
      ftrace. It creates a producer and a consumer (option to disable creation
      of the consumer) and will run for 10 seconds, then sleep for 10 seconds
      and then repeat.
      
      While running, the producer will write 10 byte loads into the ring
      buffer with just putting in the current CPU number. The reader will
      continually try to read the buffer. The reader will alternate from reading
      the buffer via event by event, or by full pages.
      
      The output is a pr_info, thus it will fill up the syslogs.
      
        Starting ring buffer hammer
        End ring buffer hammer
        Time:     9000349 (usecs)
        Overruns: 12578640
        Read:     5358440  (by events)
        Entries:  0
        Total:    17937080
        Missed:   0
        Hit:      17937080
        Entries per millisec: 1993
        501 ns per entry
        Sleeping for 10 secs
        Starting ring buffer hammer
        End ring buffer hammer
        Time:     9936350 (usecs)
        Overruns: 0
        Read:     28146644  (by pages)
        Entries:  74
        Total:    28146718
        Missed:   0
        Hit:      28146718
        Entries per millisec: 2832
        353 ns per entry
        Sleeping for 10 secs
      
      Time:      is the time the test ran
      Overruns:  the number of events that were overwritten and not read
      Read:      the number of events read (either by pages or events)
      Entries:   the number of entries left in the buffer
                       (the by pages will only read full pages)
      Total:     Entries + Read + Overruns
      Missed:    the number of entries that failed to write
      Hit:       the number of entries that were written
      
      The above example shows that it takes ~353 nanosecs per entry when
      there is a reader, reading by pages (and no overruns)
      
      The event by event reader slowed the producer down to 501 nanosecs.
      
      [ Impact: see how changes to the ring buffer affect stability and performance ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5092dbc9