1. 15 4月, 2009 5 次提交
    • S
      tracing/events: convert event call sites to use a link list · a59fd602
      Steven Rostedt 提交于
      Impact: makes it possible to define events in modules
      
      The events are created by reading down the section that they are linked
      in by the macros. But this is not scalable to modules. This patch converts
      the manipulations to use a global link list, and on boot up it adds
      the items in the section to the list.
      
      This change will allow modules to add their tracing events to the list as
      well.
      
      Note, this change alone does not permit modules to use the TRACE_EVENT macros,
      but the change is needed for them to eventually do so.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a59fd602
    • S
      tracing/events: move the ftrace event tracing code to core · f42c85e7
      Steven Rostedt 提交于
      This patch moves the ftrace creation into include/trace/ftrace.h and
      simplifies the work of developers in adding new tracepoints.
      Just the act of creating the trace points in include/trace and including
      define_trace.h will create the events in the debugfs/tracing/events
      directory.
      
      This patch removes the need of include/trace/trace_events.h
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f42c85e7
    • S
      tracing/events: move declarations from trace directory to core include · 97f20251
      Steven Rostedt 提交于
      In preparation to allowing trace events to happen in modules, we need
      to move some of the local declarations in the kernel/trace directory
      into include/linux.
      
      This patch simply moves the declarations and performs no context changes.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      97f20251
    • S
      tracing: make trace_seq operations available for core kernel · 9504504c
      Steven Rostedt 提交于
      In the process to make TRACE_EVENT macro work for modules, the trace_seq
      operations must be available for core kernel code.
      
      These operations are quite useful and can be used for other implementations.
      
      The main idea is that we create a trace_seq handle that acts very much
      like the seq_file handle.
      
      	struct trace_seq *s = kmalloc(sizeof(*s, GFP_KERNEL);
      
      	trace_seq_init(s);
      	trace_seq_printf(s, "some data %d\n", variable);
      
      	printk("%s", s->buffer);
      
      The main use is to allow a top level function call several other functions
      that may store printf like data into the buffer. Then at the end, the top
      level function can process all the data with any method it would like to.
      It could be passed to userspace, output via printk or even use seq_file:
      
      	trace_seq_to_user(s, ubuf, cnt);
      	seq_puts(m, s->buffer);
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9504504c
    • S
      tracing: create automated trace defines · a8d154b0
      Steven Rostedt 提交于
      This patch lowers the number of places a developer must modify to add
      new tracepoints. The current method to add a new tracepoint
      into an existing system is to write the trace point macro in the
      trace header with one of the macros TRACE_EVENT, TRACE_FORMAT or
      DECLARE_TRACE, then they must add the same named item into the C file
      with the macro DEFINE_TRACE(name) and then add the trace point.
      
      This change cuts out the needing to add the DEFINE_TRACE(name).
      Every file that uses the tracepoint must still include the trace/<type>.h
      file, but the one C file must also add a define before the including
      of that file.
      
       #define CREATE_TRACE_POINTS
       #include <trace/mytrace.h>
      
      This will cause the trace/mytrace.h file to also produce the C code
      necessary to implement the trace point.
      
      Note, if more than one trace/<type>.h is used to create the C code
      it is best to list them all together.
      
       #define CREATE_TRACE_POINTS
       #include <trace/foo.h>
       #include <trace/bar.h>
       #include <trace/fido.h>
      
      Thanks to Mathieu Desnoyers and Christoph Hellwig for coming up with
      the cleaner solution of the define above the includes over my first
      design to have the C code include a "special" header.
      
      This patch converts sched, irq and lockdep and skb to use this new
      method.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Zhao Lei <zhaolei@cn.fujitsu.com>
      Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a8d154b0
  2. 14 4月, 2009 9 次提交
    • S
      tracing: consolidate trace and trace_event headers · ea20d929
      Steven Rostedt 提交于
      Impact: clean up
      
      Neil Horman (et. al.) criticized the way the trace events were broken up
      into two files. The reason for that was that ftrace needed to separate out
      the declarations from where the #include <linux/tracepoint.h> was used.
      It then dawned on me that the tracepoint.h header only needs to define the
      TRACE_EVENT macro if it is not already defined.
      
      The solution is simply to test if TRACE_EVENT is defined, and if it is not
      then the linux/tracepoint.h header can define it. This change consolidates
      all the <traces>.h and <traces>_event_types.h into the <traces>.h file.
      Reported-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NTheodore Tso <tytso@mit.edu>
      Reported-by: NJiaying Zhang <jiayingz@google.com>
      Cc: Zhaolei <zhaolei@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea20d929
    • T
      tracing/filters: allow on-the-fly filter switching · 0a19e53c
      Tom Zanussi 提交于
      This patch allows event filters to be safely removed or switched
      on-the-fly while avoiding the use of rcu or the suspension of tracing of
      previous versions.
      
      It does it by adding a new filter_pred_none() predicate function which
      does nothing and by never deallocating either the predicates or any of
      the filter_pred members used in matching; the predicate lists are
      allocated and initialized during ftrace_event_calls initialization.
      
      Whenever a filter is removed or replaced, the filter_pred_* functions
      currently in use by the affected ftrace_event_call are immediately
      switched over to to the filter_pred_none() function, while the rest of
      the filter_pred members are left intact, allowing any currently
      executing filter_pred_* functions to finish up, using the values they're
      currently using.
      
      In the case of filter replacement, the new predicate values are copied
      into the old predicates after the above step, and the filter_pred_none()
      functions are replaced by the filter_pred_* functions for the new
      filter.  In this case, it is possible though very unlikely that a
      previous filter_pred_* is still running even after the
      filter_pred_none() switch and the switch to the new filter_pred_*.  In
      that case, however, because nothing has been deallocated in the
      filter_pred, the worst that can happen is that the old filter_pred_*
      function sees the new values and as a result produces either a false
      positive or a false negative, depending on the values it finds.
      
      So one downside to this method is that rarely, it can produce a bad
      match during the filter switch, but it should be possible to live with
      that, IMHO.
      
      The other downside is that at least in this patch the predicate lists
      are always pre-allocated, taking up memory from the start.  They could
      probably be allocated on first-use, and de-allocated when tracing is
      completely stopped - if this patch makes sense, I could create another
      one to do that later on.
      
      Oh, and it also places a restriction on the size of __arrays in events,
      currently set to 128, since they can't be larger than the now embedded
      str_val arrays in the filter_pred struct.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: paulmck@linux.vnet.ibm.com
      LKML-Reference: <1239610670.6660.49.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a19e53c
    • T
      tracing/filters: use ring_buffer_discard_commit() in filter_check_discard() · eb02ce01
      Tom Zanussi 提交于
      This patch changes filter_check_discard() to make use of the new
      ring_buffer_discard_commit() function and modifies the current users to
      call the old commit function in the non-discard case.
      
      It also introduces a version of filter_check_discard() that uses the
      global trace buffer (filter_current_check_discard()) for those cases.
      
      v2 changes:
      
      - fix compile error noticed by Ingo Molnar
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1239178554.10295.36.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      eb02ce01
    • T
      tracing/infrastructure: separate event tracer from event support · 5f77a88b
      Tom Zanussi 提交于
      Add a new config option, CONFIG_EVENT_TRACING that gets selected
      when CONFIG_TRACING is selected and adds everything needed by the stuff
      in trace_export - basically all the event tracing support needed by e.g.
      bprint, minus the actual events, which are only included if
      CONFIG_EVENT_TRACER is selected.
      
      So CONFIG_EVENT_TRACER can be used to turn on or off the generated events
      (what I think of as the 'event tracer'), while CONFIG_EVENT_TRACING turns
      on or off the base event tracing support used by both the event tracer and
      the other things such as bprint that can't be configured out.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fweisbec@gmail.com
      LKML-Reference: <1239178441.10295.34.camel@tropicana>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5f77a88b
    • S
      tracing/filters: use ring_buffer_discard_commit for discarded events · 77d9f465
      Steven Rostedt 提交于
      The ring_buffer_discard_commit makes better usage of the ring_buffer
      when an event has been discarded. It tries to remove it completely if
      possible.
      
      This patch converts the trace event filtering to use
      ring_buffer_discard_commit instead of the ring_buffer_event_discard.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      77d9f465
    • S
      ring-buffer: add ring_buffer_discard_commit · fa1b47dd
      Steven Rostedt 提交于
      The ring_buffer_discard_commit is similar to ring_buffer_event_discard
      but it can only be done on an event that has yet to be commited.
      Unpredictable results can happen otherwise.
      
      The main difference between ring_buffer_discard_commit and
      ring_buffer_event_discard is that ring_buffer_discard_commit will try
      to free the data in the ring buffer if nothing has addded data
      after the reserved event. If something did, then it acts almost the
      same as ring_buffer_event_discard followed by a
      ring_buffer_unlock_commit.
      
      Note, either ring_buffer_commit_discard and ring_buffer_unlock_commit
      can be called on an event, not both.
      
      This commit also exports both discard functions to be usable by
      GPL modules.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      fa1b47dd
    • T
      tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro · e45f2e2b
      Tom Zanussi 提交于
      Frederic Weisbecker suggested that the trace_special event shouldn't be
      filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
      that allows an event format to be exported without having a filter
      attached, and removes filtering from the trace_special event.
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e45f2e2b
    • T
      tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events · e1112b4d
      Tom Zanussi 提交于
      This patch adds run-time field descriptions to all the event formats
      exported using TRACE_EVENT_FORMAT.  It also hooks up all the tracers
      that use them (i.e. the tracers in the 'ftrace subsystem') so they can
      also have their output filtered by the event-filtering mechanism.
      
      When I was testing this, there were a couple of things that fooled me
      into thinking the filters weren't working, when actually they were -
      I'll mention them here so others don't make the same mistakes (and file
      bug reports. ;-)
      
      One is that some of the tracers trace multiple events e.g. the
      sched_switch tracer uses the context_switch and wakeup events, and if
      you don't set filters on all of the traced events, the unfiltered output
      from the events without filters on them can make it look like the
      filtering as a whole isn't working properly, when actually it is doing
      what it was asked to do - it just wasn't asked to do the right thing.
      
      The other is that for the really high-volume tracers e.g. the function
      tracer, the volume of filtered events can be so high that it pushes the
      unfiltered events out of the ring buffer before they can be read so e.g.
      cat'ing the trace file repeatedly shows either no output, or once in
      awhile some output but that isn't there the next time you read the
      trace, which isn't what you normally expect when reading the trace file.
      If you read from the trace_pipe file though, you can catch them before
      they disappear.
      
      Changes from v1:
      
      As suggested by Frederic Weisbecker:
      
      - get rid of externs in functions
      - added unlikely() to filter_check_discard()
      Signed-off-by: NTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e1112b4d
    • R
      PM/Hibernate: Wait for SCSI devices scan to complete during resume · c7510859
      Rafael J. Wysocki 提交于
      There is a race between resume from hibernation and the asynchronous
      scanning of SCSI devices and to prevent it from happening we need to
      call scsi_complete_async_scans() during resume from hibernation.
      
      In addition, if the resume from hibernation is userland-driven, it's
      better to wait for all device probes in the kernel to complete before
      attempting to open the resume device.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7510859
  3. 12 4月, 2009 10 次提交
    • F
      lockdep: continue lock debugging despite some taints · 574bbe78
      Frederic Weisbecker 提交于
      Impact: broaden lockdep checks
      
      Lockdep is disabled after any kernel taints. This might be convenient
      to ignore bad locking issues which sources come from outside the kernel
      tree. Nevertheless, it might be a frustrating experience for the
      staging developers or those who experience a warning but are focused
      on another things that require lockdep.
      
      The v2 of this patch simply don't disable anymore lockdep in case
      of TAINT_CRAP and TAINT_WARN events.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: LTP <ltp-list@lists.sourceforge.net>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Greg KH <gregkh@suse.de>
      LKML-Reference: <1239412638-6739-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      574bbe78
    • F
      lockdep: warn about lockdep disabling after kernel taint · 9eeba613
      Frederic Weisbecker 提交于
      Impact: provide useful missing info for developers
      
      Kernel taint can occur in several situations such as warnings,
      load of prorietary or staging modules, bad page, etc...
      
      But when such taint happens, a developer might still be working on
      the kernel, expecting that lockdep is still enabled. But a taint
      disables lockdep without ever warning about it.
      Such a kernel behaviour doesn't really help for kernel development.
      
      This patch adds this missing warning.
      
      Since the taint is done most of the time after the main message that
      explain the real source issue, it seems safe to warn about it inside
      add_taint() so that it appears at last, without hurting the main
      information.
      
      v2: Use a generic helper to disable lockdep instead of an
          open coded xchg().
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1239412638-6739-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9eeba613
    • L
      blktrace: fix output of BLK_TC_PC events · 66de7792
      Li Zefan 提交于
      BLK_TC_PC events should be treated differently with BLK_TC_FS events.
      
      Before this patch:
      
       # echo 1 > /sys/block/sda/sda1/trace/enable
       # echo pc > /sys/block/sda/sda1/trace/act_mask
       # echo blk > /debugfs/tracing/current_tracer
       # (generate some BLK_TC_PC events)
       # cat trace
              bash-2184  [000]  1774.275413:   8,7    I   N [bash]
              bash-2184  [000]  1774.275435:   8,7    D   N [bash]
              bash-2184  [000]  1774.275540:   8,7    I   R [bash]
              bash-2184  [000]  1774.275547:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275580:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275648:   8,7    I   R [bash]
              bash-2184  [000]  1774.275653:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275682:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275739:   8,7    I   R [bash]
              bash-2184  [000]  1774.275744:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275771:   8,7    C   N 0 [0]
              bash-2184  [000]  1774.275804:   8,7    I   R [bash]
              bash-2184  [000]  1774.275808:   8,7    D   R [bash]
       ksoftirqd/0-4     [000]  1774.275836:   8,7    C   N 0 [0]
      
      After this patch:
      
       # cat trace
              bash-2263  [000]   366.782149:   8,7    I   N 0 (00 ..) [bash]
              bash-2263  [000]   366.782323:   8,7    D   N 0 (00 ..) [bash]
              bash-2263  [000]   366.782557:   8,7    I   R 8 (25 00 ..) [bash]
              bash-2263  [000]   366.782560:   8,7    D   R 8 (25 00 ..) [bash]
       ksoftirqd/0-4     [000]   366.782582:   8,7    C   N (25 00 ..) [0]
              bash-2263  [000]   366.782648:   8,7    I   R 8 (5a 00 3f 00) [bash]
              bash-2263  [000]   366.782650:   8,7    D   R 8 (5a 00 3f 00) [bash]
       ksoftirqd/0-4     [000]   366.782669:   8,7    C   N (5a 00 3f 00) [0]
              bash-2263  [000]   366.782710:   8,7    I   R 8 (5a 00 08 00) [bash]
              bash-2263  [000]   366.782713:   8,7    D   R 8 (5a 00 08 00) [bash]
       ksoftirqd/0-4     [000]   366.782730:   8,7    C   N (5a 00 08 00) [0]
              bash-2263  [000]   366.783375:   8,7    I   R 36 (5a 00 08 00) [bash]
              bash-2263  [000]   366.783379:   8,7    D   R 36 (5a 00 08 00) [bash]
       ksoftirqd/0-4     [000]   366.783404:   8,7    C   N (5a 00 08 00) [0]
      
      This is what we do with PC events in user-space blktrace.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <49D32387.9040106@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      66de7792
    • L
      blktrace: fix output of unknown events · b78825d6
      Li Zefan 提交于
      Not all events are pc (packet command) events. An event is a pc
      event only if it has BLK_TC_PC bit set.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <49D3236D.3090705@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b78825d6
    • Z
      tracing, kmemtrace: Separate include/trace/kmemtrace.h to kmemtrace part and tracepoint part · 02af61bb
      Zhaolei 提交于
      Impact: refactor code for future changes
      
      Current kmemtrace.h is used both as header file of kmemtrace and kmem's
      tracepoints definition.
      
      Tracepoints' definition file may be used by other code, and should only have
      definition of tracepoint.
      
      We can separate include/trace/kmemtrace.h into 2 files:
      
        include/linux/kmemtrace.h: header file for kmemtrace
        include/trace/kmem.h:      definition of kmem tracepoints
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: NEduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      LKML-Reference: <49DEE68A.5040902@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      02af61bb
    • L
      tracing/filters: return proper error code when writing filter file · 44e9c8b7
      Li Zefan 提交于
      - propagate return value of filter_add_pred() to the user
      
      - return -ENOSPC but not -ENOMEM or -EINVAL when the filter array
        is full
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49E04CF0.3010105@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      44e9c8b7
    • L
      tracing/filters: allow user input integer to be oct or hex · a3e0ab05
      Li Zefan 提交于
      Before patch:
      
       # echo 'parent_pid == 0x10' > events/sched/sched_process_fork/filter
       # cat sched/sched_process_fork/filter
       parent_pid == 0
      
      After patch:
      
       # cat sched/sched_process_fork/filter
       parent_pid == 16
      
      Also check the input more strictly.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49E04C53.4010600@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a3e0ab05
    • L
      tracing/filters: fix NULL pointer dereference · bcabd91c
      Li Zefan 提交于
      Try this, and you'll see NULL pointer dereference bug:
      
        # echo -n 'parent_comm ==' > sched/sched_process_fork/filter
      
      Because we passed NULL ptr to simple_strtoull().
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49E04C43.1050504@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bcabd91c
    • L
      tracing/filters: NIL-terminate user input filter · 8433a40e
      Li Zefan 提交于
      Make sure messages from user space are NIL-terminated strings,
      otherwise we could dump random memory while reading filter file.
      
      Try this:
       # echo 'parent_comm ==' > events/sched/sched_process_fork/filter
       # cat events/sched/sched_process_fork/filter
       parent_comm == �
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NTom Zanussi <tzanussi@gmail.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49E04C32.6060508@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8433a40e
    • L
      async: Fix module loading async-work regression · d6de2c80
      Linus Torvalds 提交于
      Several drivers use asynchronous work to do device discovery, and we
      synchronize with them in the compiled-in case before we actually try to
      mount root filesystems etc.
      
      However, when compiled as modules, that synchronization is missing - the
      module loading completes, but the driver hasn't actually finished
      probing for devices, and that means that any user mode that expects to
      use the devices after the 'insmod' is now potentially broken.
      
      We already saw one case of a similar issue in the ACPI battery code,
      where the kernel itself expected the module to be all done, and unmapped
      the init memory - but the async device discovery was still running.
      That got hacked around by just removing the "__init" (see commit
      5d38258e "ACPI battery: fix async boot
      oops"), but the real fix is to just make the module loading wait for all
      async work to be completed.
      
      It will slow down module loading, but since common devices should be
      built in anyway, and since the bug is really annoying and hard to handle
      from user space (and caused several S3 resume regressions), the simple
      fix to wait is the right one.
      
      This fixes at least
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=13063
      
      but probably a few other bugzilla entries too (12936, for example), and
      is confirmed to fix Rafael's storage driver breakage after resume bug
      report (no bugzilla entry).
      
      We should also be able to now revert that ACPI battery fix.
      Reported-and-tested-by: NRafael J. Wysocki <rjw@suse.com>
      Tested-by: NHeinz Diehl <htd@fancy-poultry.org>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d6de2c80
  4. 10 4月, 2009 8 次提交
    • Z
      ftrace: Output REC->var instead of __entry->var for trace format · 0462b566
      Zhaolei 提交于
      print fmt: "irq=%d return=%s", __entry->irq, __entry->ret ? \"handled\" : \"unhandled\"
      
      "__entry" should be convert to "REC" by __stringify() macro.
      Signed-off-by: NZhao Lei <zhaolei@cn.fujitsu.com>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <49DC679D.2090901@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0462b566
    • L
      tracing: fix document references · 4d1f4372
      Li Zefan 提交于
      When moving documents to Documentation/trace/, I forgot to
      grep Kconfig to find out those references.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Pekka Paalanen <pq@iki.fi>
      Cc: eduard.munteanu@linux360.ro
      LKML-Reference: <49DE97EF.7080208@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4d1f4372
    • F
      tracing/lockdep: report the time waited for a lock · 2062501a
      Frederic Weisbecker 提交于
      While trying to optimize the new lock on reiserfs to replace
      the bkl, I find the lock tracing very useful though it lacks
      something important for performance (and latency) instrumentation:
      the time a task waits for a lock.
      
      That's what this patch implements:
      
        bash-4816  [000]   202.652815: lock_contended: lock_contended: &sb->s_type->i_mutex_key
        bash-4816  [000]   202.652819: lock_acquired: &rq->lock (0.000 us)
       <...>-4787  [000]   202.652825: lock_acquired: &rq->lock (0.000 us)
       <...>-4787  [000]   202.652829: lock_acquired: &rq->lock (0.000 us)
        bash-4816  [000]   202.652833: lock_acquired: &sb->s_type->i_mutex_key (16.005 us)
      
      As shown above, the "lock acquired" field is followed by the time
      it has been waiting for the lock. Usually, a lock contended entry
      is followed by a near lock_acquired entry with a non-zero time waited.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1238975373-15739-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2062501a
    • L
      tracing: fix splice return too large · 93cfb3c9
      Lai Jiangshan 提交于
      I got these from strace:
      
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
       splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
      
      I wanted to splice_read 4096 bytes, but it returns 8192 or larger.
      
      It is because the return value of tracing_buffers_splice_read()
      does not include "zero out any left over data" bytes.
      
      But tracing_buffers_read() includes these bytes, we make them
      consistent.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      93cfb3c9
    • L
      tracing: update file->f_pos when splice(2) it · c7625a55
      Lai Jiangshan 提交于
      Impact: Cleanup
      
      These two lines:
      
      	if (unlikely(*ppos))
      		return -ESPIPE;
      
      in tracing_buffers_splice_read() are not needed, VFS layer
      has disabled seek(2).
      
      We remove these two lines, and then we can update file->f_pos.
      
      And tracing_buffers_read() updates file->f_pos, this fix
      make tracing_buffers_splice_read() updates file->f_pos too.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c7625a55
    • L
      tracing: allocate page when needed · ddd538f3
      Lai Jiangshan 提交于
      Impact: Cleanup
      
      Sometimes, we open trace_pipe_raw, but we don't read(2) it,
      we just splice(2) it, thus, the page is not used.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ddd538f3
    • L
      tracing: disable seeking for trace_pipe_raw · d1e7e02f
      Lai Jiangshan 提交于
      Impact: disable pread()
      
      We set tracing_buffers_fops.llseek to no_llseek,
      but we can still perform pread() to read this file.
      
      That is not expected.
      
      This fix uses nonseekable_open() to disable it.
      
      tracing_buffers_fops.llseek is still set to no_llseek,
      it mark this file is a "non-seekable device" and is used by
      sys_splice(). See also do_splice() or manual of splice(2):
      
      ERRORS
             EINVAL Target file system doesn't support  splicing;
                    neither  of the descriptors refers to a pipe;
                    or offset given for non-seekable device.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d1e7e02f
    • H
      mutex: have non-spinning mutexes on s390 by default · 36cd3c9f
      Heiko Carstens 提交于
      Impact: performance regression fix for s390
      
      The adaptive spinning mutexes will not always do what one would expect on
      virtualized architectures like s390. Especially the cpu_relax() loop in
      mutex_spin_on_owner might hurt if the mutex holding cpu has been scheduled
      away by the hypervisor.
      
      We would end up in a cpu_relax() loop when there is no chance that the
      state of the mutex changes until the target cpu has been scheduled again by
      the hypervisor.
      
      For that reason we should change the default behaviour to no-spin on s390.
      
      We do have an instruction which allows to yield the current cpu in favour of
      a different target cpu. Also we have an instruction which allows us to figure
      out if the target cpu is physically backed.
      
      However we need to do some performance tests until we can come up with
      a solution that will do the right thing on s390.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      LKML-Reference: <20090409184834.7a0df7b2@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      36cd3c9f
  5. 09 4月, 2009 6 次提交
    • L
      blktrace: pass the right pointer to kfree() · 9eb85125
      Li Zefan 提交于
      Impact: fix kfree crash with non-standard act_mask string
      
      If passing a string with leading white spaces to strstrip(),
      the returned ptr != the original ptr.
      
      This bug was introduced by me.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <49DD694C.8020902@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9eb85125
    • F
      tracing/syscalls: use a dedicated file header · 47788c58
      Frederic Weisbecker 提交于
      Impact: fix build warnings and possibe compat misbehavior on IA64
      
      Building a kernel on ia64 might trigger these ugly build warnings:
      
      CC      arch/ia64/ia32/sys_ia32.o
      In file included from arch/ia64/ia32/sys_ia32.c:55:
      arch/ia64/ia32/ia32priv.h:290:1: warning: "elf_check_arch" redefined
      In file included from include/linux/elf.h:7,
                       from include/linux/module.h:14,
                       from include/linux/ftrace.h:8,
                       from include/linux/syscalls.h:68,
                       from arch/ia64/ia32/sys_ia32.c:18:
      arch/ia64/include/asm/elf.h:19:1: warning: this is the location of the previous definition
      [...]
      
      sys_ia32.c includes linux/syscalls.h which in turn includes linux/ftrace.h
      to import the syscalls tracing prototypes.
      
      But including ftrace.h can pull too much things for a low level file,
      especially on ia64 where the ia32 private headers conflict with higher
      level headers.
      
      Now we isolate the syscall tracing headers in their own lightweight file.
      Reported-by: NTony Luck <tony.luck@intel.com>
      Tested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: "Frank Ch. Eigler" <fche@redhat.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Jiaying Zhang <jiayingz@google.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: Martin Bligh <mbligh@google.com>
      Cc: Michael Davidson <md@google.com>
      LKML-Reference: <20090408184058.GB6017@nowhere>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47788c58
    • A
      work_on_cpu(): rewrite it to create a kernel thread on demand · 6b44003e
      Andrew Morton 提交于
      Impact: circular locking bugfix
      
      The various implemetnations and proposed implemetnations of work_on_cpu()
      are vulnerable to various deadlocks because they all used queues of some
      form.
      
      Unrelated pieces of kernel code thus gained dependencies wherein if one
      work_on_cpu() caller holds a lock which some other work_on_cpu() callback
      also takes, the kernel could rarely deadlock.
      
      Fix this by creating a short-lived kernel thread for each work_on_cpu()
      invokation.
      
      This is not terribly fast, but the only current caller of work_on_cpu() is
      pci_call_probe().
      
      It would be nice to find some other way of doing the node-local
      allocations in the PCI probe code so that we can zap work_on_cpu()
      altogether.  The code there is rather nasty.  I can't think of anything
      simple at this time...
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      6b44003e
    • O
      kthread: move sched-realeted initialization from kthreadd context · 1c99315b
      Oleg Nesterov 提交于
      kthreadd is the single thread which implements ths "create" request, move
      sched_setscheduler/etc from create_kthread() to kthread_create() to
      improve the scalability.
      
      We should be careful with sched_setscheduler(), use _nochek helper.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: Vitaliy Gusev <vgusev@openvz.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      1c99315b
    • V
      kthread: Don't looking for a task in create_kthread() #2 · 3217ab97
      Vitaliy Gusev 提交于
      Remove the unnecessary find_task_by_pid_ns(). kthread() can just
      use "current" to get the same result.
      Signed-off-by: NVitaliy Gusev <vgusev@openvz.org>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      3217ab97
    • R
      ptrace: some checkpatch fixes · 3a709703
      Roland McGrath 提交于
      This fixes all the checkpatch --file complaints about kernel/ptrace.c
      and also removes an unused #include.  I've verified that there are no
      changes to the compiled code on x86_64.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ Removed the parts that just split a line  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a709703
  6. 08 4月, 2009 2 次提交