1. 05 5月, 2010 1 次提交
    • S
      tracing: Fix tracepoint.h DECLARE_TRACE() to allow more than one header · 2e26ca71
      Steven Rostedt 提交于
      When more than one header is included under CREATE_TRACE_POINTS
      the DECLARE_TRACE() macro is not defined back to its original meaning
      and the second include will fail to initialize the TRACE_EVENT()
      and DECLARE_TRACE() correctly.
      
      To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
      out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
      define of the TRACE_EVENT()). This way the define_trace.h will undef
      the DECLARE_TRACE() at the end and allow new headers to start
      from scratch.
      
      This patch also requires fixing the include/events/napi.h
      
      It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
      format. But I'll leave that change to the authors of that file.
      But since the napi.h file depends on using the CREATE_TRACE_POINTS
      and does not define its own DEFINE_TRACE() it must use the define_trace.h
      method instead.
      
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2e26ca71
  2. 04 5月, 2010 1 次提交
    • S
      tracing: Convert nop macros to static inlines · 4dbf6bc2
      Steven Rostedt 提交于
      The ftrace.h file contains several functions as macros when the
      functions are disabled due to config options. This patch converts
      most of them to static inlines.
      
      There are two exceptions:
      
        register_ftrace_function() and unregister_ftrace_function()
      
      This is because their parameter "ops" must not be evaluated since
      code using the function is allowed to #ifdef out the creation of
      the parameter.
      
      This also fixes an error caused by recent changes:
      
       kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
       kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4dbf6bc2
  3. 28 4月, 2010 2 次提交
    • D
      ring-buffer: Make non-consuming read less expensive with lots of cpus. · 72c9ddfd
      David Miller 提交于
      When performing a non-consuming read, a synchronize_sched() is
      performed once for every cpu which is actively tracing.
      
      This is very expensive, and can make it take several seconds to open
      up the 'trace' file with lots of cpus.
      
      Only one synchronize_sched() call is actually necessary.  What is
      desired is for all cpus to see the disabling state change.  So we
      transform the existing sequence:
      
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      where each ring_buffer_start() call performs a synchronize_sched(),
      into the following:
      
      	for_each_cpu() {
      		ring_buffer_read_prepare();
      	}
      	ring_buffer_read_prepare_sync();
      	for_each_cpu() {
      		ring_buffer_read_start();
      	}
      
      wherein only the single ring_buffer_read_prepare_sync() call needs to
      do the synchronize_sched().
      
      The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
      memory and increments ->record_disabled.
      
      In the second phase, ring_buffer_read_prepare_sync() makes sure this
      ->record_disabled state is visible fully to all cpus.
      
      And in the final third phase, the ring_buffer_read_start() calls reset
      the 'iter' objects allocated in the first phase since we now know that
      none of the cpus are adding trace entries any more.
      
      This makes openning the 'trace' file nearly instantaneous on a
      sparc64 Niagara2 box with 128 cpus tracing.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      72c9ddfd
    • J
      tracing: Add graph output support for irqsoff tracer · 62b915f1
      Jiri Olsa 提交于
      Add function graph output to irqsoff tracer.
      
      The graph output is enabled by setting new 'display-graph' trace option.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      62b915f1
  4. 22 4月, 2010 1 次提交
    • F
      tracing: Dump either the oops's cpu source or all cpus buffers · cecbca96
      Frederic Weisbecker 提交于
      The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
      dump every cpu buffers when an oops or panic happens.
      
      It's nice when you have few cpus but it may take ages if have many,
      plus you miss the real origin of the problem in all the cpu traces.
      
      Sometimes, all you need is to dump the cpu buffer that triggered the
      opps, most of the time it is our main interest.
      
      This patch modifies ftrace_dump_on_oops to handle this choice.
      
      The ftrace_dump_on_oops kernel parameter, when it comes alone, has
      the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
      will only dump the buffer of the cpu that oops'ed.
      
      Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
      echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
      behaviour. But setting 2 jumps into cpu origin dump mode.
      
      v2: Fix double setup
      v3: Fix spelling issues reported by Randy Dunlap
      v4: Also update __ftrace_dump in the selftests
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      cecbca96
  5. 19 4月, 2010 2 次提交
  6. 15 4月, 2010 1 次提交
  7. 14 4月, 2010 2 次提交
    • D
      rcu: Better explain the condition parameter of rcu_dereference_check() · c08c68dd
      David Howells 提交于
      Better explain the condition parameter of
      rcu_dereference_check() that describes the conditions under
      which the dereference is permitted to take place (and
      incorporate Yong Zhang's suggestion).  This condition is only
      checked under lockdep proving.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c08c68dd
    • P
      rcu: Add rcu_access_pointer and rcu_dereference_protected · b62730ba
      Paul E. McKenney 提交于
      This patch adds variants of rcu_dereference() that handle
      situations where the RCU-protected data structure cannot change,
      perhaps due to our holding the update-side lock, or where the
      RCU-protected pointer is only to be fetched, not dereferenced.
      These are needed due to some performance concerns with using
      rcu_dereference() where it is not required, aside from the need
      for lockdep/sparse checking.
      
      The new rcu_access_pointer() primitive is for the case where the
      pointer is be fetch and not dereferenced.  This primitive may be
      used without protection, RCU or otherwise, due to the fact that
      it uses ACCESS_ONCE().
      
      The new rcu_dereference_protected() primitive is for the case
      where updates are prevented, for example, due to holding the
      update-side lock.  This primitive does neither ACCESS_ONCE() nor
      smp_read_barrier_depends(), so can only be used when updates are
      somehow prevented.
      Suggested-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b62730ba
  8. 12 4月, 2010 1 次提交
    • T
      NFSv4: fix delegated locking · 0df5dd4a
      Trond Myklebust 提交于
      Arnaud Giersch reports that NFSv4 locking is broken when we hold a
      delegation since commit 8e469ebd (NFSv4:
      Don't allow posix locking against servers that don't support it).
      
      According to Arnaud, the lock succeeds the first time he opens the file
      (since we cannot do a delegated open) but then fails after we start using
      delegated opens.
      
      The following patch fixes it by ensuring that locking behaviour is
      governed by a per-filesystem capability flag that is initially set, but
      gets cleared if the server ever returns an OPEN without the
      NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.
      Reported-by: NArnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      0df5dd4a
  9. 10 4月, 2010 4 次提交
  10. 09 4月, 2010 1 次提交
  11. 08 4月, 2010 2 次提交
  12. 07 4月, 2010 7 次提交
  13. 06 4月, 2010 4 次提交
  14. 05 4月, 2010 1 次提交
  15. 04 4月, 2010 1 次提交
  16. 02 4月, 2010 1 次提交
    • Y
      ibft, x86: Change reserve_ibft_region() to find_ibft_region() · 042be38e
      Yinghai Lu 提交于
      This allows arch code could decide the way to reserve the ibft.
      
      And we should reserve ibft as early as possible, instead of BOOTMEM
      stage, in case the table is in RAM range and is not reserved by BIOS
      (this will often be the case.)
      
      Move to just after find_smp_config().
      
      Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore.
      
      -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <konrad@darnok.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4BB510FB.80601@kernel.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      CC: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      042be38e
  17. 01 4月, 2010 8 次提交
    • H
      ide: Requeue request after DMA timeout · 6072f749
      Herbert Xu 提交于
      I noticed that my KVM virtual machines were experiencing IDE
      issues resulting in processes stuck on waiting for buffers to
      complete.
      
      The root cause is of course race conditions in the ancient qemu
      backend that I'm using.  However, the fact that the guest isn't
      recovering is a bug.
      
      I've tracked it down to the change made last year to dequeue
      requests at the start rather than at the end in the IDE layer.
      
      commit 8f6205cd
      Author: Tejun Heo <tj@kernel.org>
      Date:   Fri May 8 11:53:59 2009 +0900
      
          ide: dequeue in-flight request
      
      The problem is that the function ide_dma_timeout_retry does not
      requeue the current request, causing one request to be lost for
      each DMA timeout.
      
      This patch fixes this by requeueing the request.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6072f749
    • F
      perf: Use hot regs with software sched switch/migrate events · e49a5bd3
      Frederic Weisbecker 提交于
      Scheduler's task migration events don't work because they always
      pass NULL regs perf_sw_event(). The event hence gets filtered
      in perf_swevent_add().
      
      Scheduler's context switches events use task_pt_regs() to get
      the context when the event occured which is a wrong thing to
      do as this won't give us the place in the kernel where we went
      to sleep but the place where we left userspace. The result is
      even more wrong if we switch from a kernel thread.
      
      Use the hot regs snapshot for both events as they belong to the
      non-interrupt/exception based events family. Unlike page faults
      or so that provide the regs matching the exact origin of the event,
      we need to save the current context.
      
      This makes the task migration event working and fix the context
      switch callchains and origin ip.
      
      Example: perf record -a -e cs
      
      Before:
      
          10.91%      ksoftirqd/0                  0  [k] 0000000000000000
                      |
                      --- (nil)
                          perf_callchain
                          perf_prepare_sample
                          __perf_event_overflow
                          perf_swevent_overflow
                          perf_swevent_add
                          perf_swevent_ctx_event
                          do_perf_sw_event
                          __perf_sw_event
                          perf_event_task_sched_out
                          schedule
                          run_ksoftirqd
                          kthread
                          kernel_thread_helper
      
      After:
      
          23.77%  hald-addon-stor  [kernel.kallsyms]  [k] schedule
                  |
                  --- schedule
                     |
                     |--60.00%-- schedule_timeout
                     |          wait_for_common
                     |          wait_for_completion
                     |          blk_execute_rq
                     |          scsi_execute
                     |          scsi_execute_req
                     |          sr_test_unit_ready
                     |          |
                     |          |--66.67%-- sr_media_change
                     |          |          media_changed
                     |          |          cdrom_media_changed
                     |          |          sr_block_media_changed
                     |          |          check_disk_change
                     |          |          cdrom_open
      
      v2: Always build perf_arch_fetch_caller_regs() now that software
      events need that too. They don't need it from modules, unlike trace
      events, so we keep the EXPORT_SYMBOL in trace_event_perf.c
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      e49a5bd3
    • S
      tracing: Show the lost events in the trace_pipe output · bc21b478
      Steven Rostedt 提交于
      Now that the ring buffer can keep track of where events are lost.
      Use this information to the output of trace_pipe:
      
             hackbench-3588  [001]  1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
             hackbench-3588  [001]  1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
             hackbench-3588  [001]  1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
      CPU:1 [LOST 673 EVENTS]
             hackbench-3588  [001]  1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
             hackbench-3588  [001]  1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
             hackbench-3588  [001]  1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem
      
      Even works with the function graph tracer:
      
       2) ! 170.098 us  |                                            }
       2)   4.036 us    |                                            rcu_irq_exit();
       2)   3.657 us    |                                            idle_cpu();
       2) ! 190.301 us  |                                          }
      CPU:2 [LOST 2196 EVENTS]
       2)   0.853 us    |                            } /* cancel_dirty_page */
       2)               |                            remove_from_page_cache() {
       2)   1.578 us    |                              _raw_spin_lock_irq();
       2)               |                              __remove_from_page_cache() {
      
      Note, it does not work with the iterator "trace" file, since it requires
      the use of consuming the page from the ring buffer to determine how many
      events were lost, which the iterator does not do.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bc21b478
    • S
      ring-buffer: Add place holder recording of dropped events · 66a8cb95
      Steven Rostedt 提交于
      Currently, when the ring buffer drops events, it does not record
      the fact that it did so. It does inform the writer that the event
      was dropped by returning a NULL event, but it does not put in any
      place holder where the event was dropped.
      
      This is not a trivial thing to add because the ring buffer mostly
      runs in overwrite (flight recorder) mode. That is, when the ring
      buffer is full, new data will overwrite old data.
      
      In a produce/consumer mode, where new data is simply dropped when
      the ring buffer is full, it is trivial to add the placeholder
      for dropped events. When there's more room to write new data, then
      a special event can be added to notify the reader about the dropped
      events.
      
      But in overwrite mode, any new write can overwrite events. A place
      holder can not be inserted into the ring buffer since there never
      may be room. A reader could also come in at anytime and miss the
      placeholder.
      
      Luckily, the way the ring buffer works, the read side can find out
      if events were lost or not, and how many events. Everytime a write
      takes place, if it overwrites the header page (the next read) it
      updates a "overrun" variable that keeps track of the number of
      lost events. When a reader swaps out a page from the ring buffer,
      it can record this number, perfom the swap, and then check to
      see if the number changed, and take the diff if it has, which would be
      the number of events dropped. This can be stored by the reader
      and returned to callers of the reader.
      
      Since the reader page swap will fail if the writer moved the head
      page since the time the reader page set up the swap, this gives room
      to record the overruns without worrying about races. If the reader
      sets up the pages, records the overrun, than performs the swap,
      if the swap succeeds, then the overrun variable has not been
      updated since the setup before the swap.
      
      For binary readers of the ring buffer, a flag is set in the header
      of each sub page (sub buffer) of the ring buffer. This flag is embedded
      in the size field of the data on the sub buffer, in the 31st bit (the size
      can be 32 or 64 bits depending on the architecture), but only 27
      bits needs to be used for the actual size (less actually).
      
      We could add a new field in the sub buffer header to also record the
      number of events dropped since the last read, but this will change the
      format of the binary ring buffer a bit too much. Perhaps this change can
      be made if the information on the number of events dropped is considered
      important enough.
      
      Note, the notification of dropped events is only used by consuming reads
      or peeking at the ring buffer. Iterating over the ring buffer does not
      keep this information because the necessary data is only available when
      a page swap is made, and the iterator does not swap out pages.
      
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      66a8cb95
    • S
      tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not set · eb0c5377
      Steven Rostedt 提交于
      If modules are configured in the build but unloading of modules is not,
      then the refcnt is not defined. Place the get/put module tracepoints
      under CONFIG_MODULE_UNLOAD since it references this field in the module
      structure.
      
      As a side-effect, this patch also reduces the code when MODULE_UNLOAD
      is not set, because these unused tracepoints are not created.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb0c5377
    • L
      tracing: Remove side effect from module tracepoints that caused a GPF · ae832d1e
      Li Zefan 提交于
      Remove the @refcnt argument, because it has side-effects, and arguments with
      side-effects are not skipped by the jump over disabled instrumentation and are
      executed even when the tracepoint is disabled.
      
      This was also causing a GPF as found by Randy Dunlap:
      
      Subject: 2.6.33 GP fault only when built with tracing
      LKML-Reference: <4BA2B69D.3000309@oracle.com>
      
      Note, the current 2.6.34-rc has a fix for the actual cause of the GPF,
      but this fixes one of its triggers.
      Tested-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ae832d1e
    • L
      tracing: Update comments · 50354a8a
      Li Zefan 提交于
      Make some comments consistent with the code.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4BA97FD0.7090202@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      50354a8a
    • L
      tracing: Convert some signal events to DEFINE_TRACE · 4bdde044
      Li Zefan 提交于
      Use DECLARE_EVENT_CLASS to remove duplicate code:
      
      text    data     bss     dec     hex filename
        23639    6084       8   29731    7423 kernel/signal.o.orig
        22727    6084       8   28819    7093 kernel/signal.o
      
      2 events are converted:
      
        signal_queue_overflow: signal_overflow_fail, signal_lose_info
      
      No functional change.
      Acked-by: NMasami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      LKML-Reference: <4BA97FBD.8070703@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4bdde044