1. 01 4月, 2010 5 次提交
    • S
      ring-buffer: Add lost event count to end of sub buffer · ff0ff84a
      Steven Rostedt 提交于
      Currently, binary readers of the ring buffer only know where events were
      lost, but not how many events were lost at that location.
      This information is available, but it would require adding another
      field to the sub buffer header to include it.
      
      But when a event can not fit at the end of a sub buffer, it is written
      to the next sub buffer. This means there is a good chance that the
      buffer may have room to hold this counter. If it does, write
      the counter at the end of the sub buffer and set another flag
      in the data size field that states that this information exists.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ff0ff84a
    • S
      tracing: Show the lost events in the trace_pipe output · bc21b478
      Steven Rostedt 提交于
      Now that the ring buffer can keep track of where events are lost.
      Use this information to the output of trace_pipe:
      
             hackbench-3588  [001]  1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
             hackbench-3588  [001]  1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
             hackbench-3588  [001]  1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
      CPU:1 [LOST 673 EVENTS]
             hackbench-3588  [001]  1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
             hackbench-3588  [001]  1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
             hackbench-3588  [001]  1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem
      
      Even works with the function graph tracer:
      
       2) ! 170.098 us  |                                            }
       2)   4.036 us    |                                            rcu_irq_exit();
       2)   3.657 us    |                                            idle_cpu();
       2) ! 190.301 us  |                                          }
      CPU:2 [LOST 2196 EVENTS]
       2)   0.853 us    |                            } /* cancel_dirty_page */
       2)               |                            remove_from_page_cache() {
       2)   1.578 us    |                              _raw_spin_lock_irq();
       2)               |                              __remove_from_page_cache() {
      
      Note, it does not work with the iterator "trace" file, since it requires
      the use of consuming the page from the ring buffer to determine how many
      events were lost, which the iterator does not do.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bc21b478
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
    • S
      tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set · eb0c5377
      Steven Rostedt 提交于
      If modules are configured in the build but unloading of modules is not,
      then the refcnt is not defined. Place the get/put module tracepoints
      under CONFIG_MODULE_UNLOAD since it references this field in the module
      structure.
      
      As a side-effect, this patch also reduces the code when MODULE_UNLOAD
      is not set, because these unused tracepoints are not created.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb0c5377
    • L
      tracing: Remove side effect from module tracepoints that caused a GPF · ae832d1e
      Li Zefan 提交于
      Remove the @refcnt argument, because it has side-effects, and arguments with
      side-effects are not skipped by the jump over disabled instrumentation and are
      executed even when the tracepoint is disabled.
      
      This was also causing a GPF as found by Randy Dunlap:
      
      Subject: 2.6.33 GP fault only when built with tracing
      LKML-Reference: <4BA2B69D.3000309@oracle.com>
      
      Note, the current 2.6.34-rc has a fix for the actual cause of the GPF,
      but this fixes one of its triggers.
      Tested-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae832d1e
  2. 30 3月, 2010 4 次提交
  3. 25 3月, 2010 3 次提交
  4. 24 3月, 2010 3 次提交
  5. 23 3月, 2010 1 次提交
    • J
      time: Fix accumulation bug triggered by long delay. · 830ec045
      John Stultz 提交于
      The logarithmic accumulation done in the timekeeping has some overflow
      protection that limits the max shift value. That means it will take
      more then shift loops to accumulate all of the cycles. This causes
      the shift decrement to underflow, which causes the loop to never exit.
      
      The simplest fix would be simply to do a:
      	if (shift)
      		shift--;
      
      However that is not optimal, as we know the cycle offset is larger
      then the interval << shift, the above would make shift drop to zero,
      then we would be spinning for quite awhile accumulating at interval
      chunks at a time.
      
      Instead, this patch only decreases shift if the offset is smaller
      then cycle_interval << shift.  This makes sure we accumulate using
      the largest chunks possible without overflowing tick_length, and limits
      the number of iterations through the loop.
      
      This issue was found and reported by Sonic Zhang, who also tested the fix.
      Many thanks your explanation and testing!
      Reported-by: NSonic Zhang <sonic.adi@gmail.com>
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Tested-by: NSonic Zhang <sonic.adi@gmail.com>
      LKML-Reference: <1268948850-5225-1-git-send-email-johnstul@us.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      830ec045
  6. 22 3月, 2010 1 次提交
  7. 19 3月, 2010 1 次提交
    • S
      ring-buffer: Do 8 byte alignment for 64 bit that can not handle 4 byte align · 2271048d
      Steven Rostedt 提交于
      The ring buffer uses 4 byte alignment while recording events into the
      buffer, even on 64bit machines. This saves space when there are lots
      of events being recorded at 4 byte boundaries.
      
      The ring buffer has a zero copy method to write into the buffer, with
      the reserving of space and then committing it. This may cause problems
      when writing an 8 byte word into a 4 byte alignment (not 8). For x86 and
      PPC this is not an issue, but on some architectures this would cause an
      out-of-alignment exception.
      
      This patch uses CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS to determine
      if it is OK to use 4 byte alignments on 64 bit machines. If it is not,
      it forces the ring buffer event header to be 8 bytes and not 4,
      and will align the length of the data to be 8 byte aligned.
      This keeps the data payload at 8 byte alignments and will allow these
      machines to run without issue.
      
      The trick to this is that the header can be either 4 bytes or 8 bytes
      depending on the length of the data payload. The 4 byte header
      has a length field that supports up to 112 bytes. If the length of
      the data is more than 112, the length field is set to zero, and the actual
      length is stored in the next 4 bytes after the header.
      
      When CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set, the code forces
      zero in the 4 byte header forcing the length to be stored in the 4 byte
      array, even with a small data load. It also forces the length of the
      data load to be 8 byte aligned. The combination of these two guarantee
      that the data is always at 8 byte alignment.
      Tested-by: NFrederic Weisbecker <fweisbec@gmail.com>
                 (on sparc64)
      Reported-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2271048d
  8. 17 3月, 2010 2 次提交
    • F
      perf: Fix unexported generic perf_arch_fetch_caller_regs · dcd5c166
      Frederic Weisbecker 提交于
      perf_arch_fetch_caller_regs() is exported for the overriden x86
      version, but not for the generic weak version.
      
      As a general rule, weak functions should not have their symbol
      exported in the same file they are defined.
      
      So let's export it on trace_event_perf.c as it is used by trace
      events only.
      
      This fixes:
      
      	ERROR: ".perf_arch_fetch_caller_regs" [fs/xfs/xfs.ko] undefined!
      	ERROR: ".perf_arch_fetch_caller_regs" [arch/powerpc/platforms/cell/spufs/spufs.ko] undefined!
      
      -v2: And also only build it if trace events are enabled.
      -v3: Fix changelog mistake
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1268697902-9518-1-git-send-regression-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      dcd5c166
    • K
      sched: Use proper type in sched_getaffinity() · 8bc037fb
      KOSAKI Motohiro 提交于
      Using the proper type fixes the following compiler warning:
      
        kernel/sched.c:4850: warning: comparison of distinct pointer types lacks a cast
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: torvalds@linux-foundation.org
      Cc: travis@sgi.com
      Cc: peterz@infradead.org
      Cc: drepper@redhat.com
      Cc: rja@sgi.com
      Cc: sharyath@in.ibm.com
      Cc: steiner@sgi.com
      LKML-Reference: <20100317090046.4C79.A69D9226@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8bc037fb
  9. 16 3月, 2010 2 次提交
  10. 15 3月, 2010 1 次提交
    • K
      sched: sched_getaffinity(): Allow less than NR_CPUS length · cd3d8031
      KOSAKI Motohiro 提交于
      [ Note, this commit changes the syscall ABI for > 1024 CPUs systems. ]
      
      Recently, some distro decided to use NR_CPUS=4096 for mysterious reasons.
      Unfortunately, glibc sched interface has the following definition:
      
      	# define __CPU_SETSIZE  1024
      	# define __NCPUBITS     (8 * sizeof (__cpu_mask))
      	typedef unsigned long int __cpu_mask;
      	typedef struct
      	{
      	  __cpu_mask __bits[__CPU_SETSIZE / __NCPUBITS];
      	} cpu_set_t;
      
      It mean, if NR_CPUS is bigger than 1024, cpu_set_t makes an
      ABI issue ...
      
      More recently, Sharyathi Nagesh reported following test program makes
      misterious syscall failure:
      
       -----------------------------------------------------------------------
       #define _GNU_SOURCE
       #include<stdio.h>
       #include<errno.h>
       #include<sched.h>
      
       int main()
       {
           cpu_set_t set;
           if (sched_getaffinity(0, sizeof(cpu_set_t), &set) < 0)
               printf("\n Call is failing with:%d", errno);
       }
       -----------------------------------------------------------------------
      
      Because the kernel assumes len argument of sched_getaffinity() is bigger
      than NR_CPUS. But now it is not correct.
      
      Now we are faced with the following annoying dilemma, due to
      the limitations of the glibc interface built in years ago:
      
       (1) if we change glibc's __CPU_SETSIZE definition, we lost
           binary compatibility of _all_ application.
      
       (2) if we don't change it, we also lost binary compatibility of
           Sharyathi's use case.
      
      Then, I would propse to change the rule of the len argument of
      sched_getaffinity().
      
      Old:
      	len should be bigger than NR_CPUS
      New:
      	len should be bigger than maximum possible cpu id
      
      This creates the following behavior:
      
       (A) In the real 4096 cpus machine, the above test program still
           return -EINVAL.
      
       (B) NR_CPUS=4096 but the machine have less than 1024 cpus (almost
           all machines in the world), the above can run successfully.
      
      Fortunatelly, BIG SGI machine is mainly used for HPC use case. It means
      they can rebuild their programs.
      
      IOW we hope they are not annoyed by this issue ...
      Reported-by: NSharyathi Nagesh <sharyath@in.ibm.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Mike Travis <travis@sgi.com>
      LKML-Reference: <20100312161316.9520.A69D9226@jp.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd3d8031
  11. 13 3月, 2010 17 次提交