1. 13 3月, 2010 5 次提交
    • S
      tracing: Do not record user stack trace from NMI context · b6345879
      Steven Rostedt 提交于
      A bug was found with Li Zefan's ftrace_stress_test that caused applications
      to segfault during the test.
      
      Placing a tracing_off() in the segfault code, and examining several
      traces, I found that the following was always the case. The lock tracer
      was enabled (lockdep being required) and userstack was enabled. Testing
      this out, I just enabled the two, but that was not good enough. I needed
      to run something else that could trigger it. Running a load like hackbench
      did not work, but executing a new program would. The following would
      trigger the segfault within seconds:
      
        # echo 1 > /debug/tracing/options/userstacktrace
        # echo 1 > /debug/tracing/events/lock/enable
        # while :; do ls > /dev/null ; done
      
      Enabling the function graph tracer and looking at what was happening
      I finally noticed that all cashes happened just after an NMI.
      
       1)               |    copy_user_handle_tail() {
       1)               |      bad_area_nosemaphore() {
       1)               |        __bad_area_nosemaphore() {
       1)               |          no_context() {
       1)               |            fixup_exception() {
       1)   0.319 us    |              search_exception_tables();
       1)   0.873 us    |            }
      [...]
       1)   0.314 us    |  __rcu_read_unlock();
       1)   0.325 us    |    native_apic_mem_write();
       1)   0.943 us    |  }
       1)   0.304 us    |  rcu_nmi_exit();
      [...]
       1)   0.479 us    |  find_vma();
       1)               |  bad_area() {
       1)               |    __bad_area() {
      
      After capturing several traces of failures, all of them happened
      after an NMI. Curious about this, I added a trace_printk() to the NMI
      handler to read the regs->ip to see where the NMI happened. In which I
      found out it was here:
      
      ffffffff8135b660 <page_fault>:
      ffffffff8135b660:       48 83 ec 78             sub    $0x78,%rsp
      ffffffff8135b664:       e8 97 01 00 00          callq  ffffffff8135b800 <error_entry>
      
      What was happening is that the NMI would happen at the place that a page
      fault occurred. It would call rcu_read_lock() which was traced by
      the lock events, and the user_stack_trace would run. This would trigger
      a page fault inside the NMI. I do not see where the CR2 register is
      saved or restored in NMI handling. This means that it would corrupt
      the page fault handling that the NMI interrupted.
      
      The reason the while loop of ls helped trigger the bug, was that
      each execution of ls would cause lots of pages to be faulted in, and
      increase the chances of the race happening.
      
      The simple solution is to not allow user stack traces in NMI context.
      After this patch, I ran the above "ls" test for a couple of hours
      without any issues. Without this patch, the bug would trigger in less
      than a minute.
      
      Cc: stable@kernel.org
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b6345879
    • S
      tracing: Disable buffer switching when starting or stopping trace · a2f80714
      Steven Rostedt 提交于
      When the trace iterator is read, tracing_start() and tracing_stop()
      is called to stop tracing while the iterator is processing the trace
      output.
      
      These functions disable both the standard buffer and the max latency
      buffer. But if the wakeup tracer is running, it can switch these
      buffers between the two disables:
      
        buffer = global_trace.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
            <<<--------- swap happens here
      
        buffer = max_tr.buffer;
        if (buffer)
            ring_buffer_record_disable(buffer);
      
      What happens is that we disabled the same buffer twice. On tracing_start()
      we can enable the same buffer twice. All ring_buffer_record_disable()
      must be matched with a ring_buffer_record_enable() or the buffer
      can be disable permanently, or enable prematurely, and cause a bug
      where a reset happens while a trace is commiting.
      
      This patch protects these two by taking the ftrace_max_lock to prevent
      a switch from occurring.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a2f80714
    • S
      tracing: Use same local variable when resetting the ring buffer · 283740c6
      Steven Rostedt 提交于
      In the ftrace code that resets the ring buffer it references the
      buffer with a local variable, but then uses the tr->buffer as the
      parameter to reset. If the wakeup tracer is running, which can
      switch the tr->buffer with the max saved buffer, this can break
      the requirement of disabling the buffer before the reset.
      
         buffer = tr->buffer;
         ring_buffer_record_disable(buffer);
         synchronize_sched();
         __tracing_reset(tr->buffer, cpu);
      
      If the tr->buffer is swapped, then the reset is not happening to the
      buffer that was disabled. This will cause the ring buffer to fail.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Reported-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      283740c6
    • S
      function-graph: Init curr_ret_stack with ret_stack · ea14eb71
      Steven Rostedt 提交于
      If the graph tracer is active, and a task is forked but the allocating of
      the processes graph stack fails, it can cause crash later on.
      
      This is due to the temporary stack being NULL, but the curr_ret_stack
      variable is copied from the parent. If it is not -1, then in
      ftrace_graph_probe_sched_switch() the following:
      
      	for (index = next->curr_ret_stack; index >= 0; index--)
      		next->ret_stack[index].calltime += timestamp;
      
      Will cause a kernel OOPS.
      
      Found with Li Zefan's ftrace_stress_test.
      
      Cc: stable@kernel.org
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ea14eb71
    • L
      ring-buffer: Move disabled check into preempt disable section · 52fbe9cd
      Lai Jiangshan 提交于
      The ring buffer resizing and resetting relies on a schedule RCU
      action. The buffers are disabled, a synchronize_sched() is called
      and then the resize or reset takes place.
      
      But this only works if the disabling of the buffers are within the
      preempt disabled section, otherwise a window exists that the buffers
      can be written to while a reset or resize takes place.
      
      Cc: stable@kernel.org
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B949E43.2010906@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      52fbe9cd
  2. 11 3月, 2010 1 次提交
  3. 06 3月, 2010 4 次提交
    • T
      function-graph: Add tracing_thresh support to function_graph tracer · 0e950173
      Tim Bird 提交于
      Add support for tracing_thresh to the function_graph tracer.  This
      version of this feature isolates the checks into new entry and
      return functions, to avoid adding more conditional code into the
      main function_graph paths.
      
      When the tracing_thresh is set and the function graph tracer is
      enabled, only the functions that took longer than the time in
      microseconds that was set in tracing_thresh are recorded. To do this
      efficiently, only the function exits are recorded:
      
       [tracing]# echo 100 > tracing_thresh
       [tracing]# echo function_graph > current_tracer
       [tracing]# cat trace
       # tracer: function_graph
       #
       # CPU  DURATION                  FUNCTION CALLS
       # |     |   |                     |   |   |   |
        1) ! 119.214 us  |  } /* smp_apic_timer_interrupt */
        1)   <========== |
        0) ! 101.527 us  |              } /* __rcu_process_callbacks */
        0) ! 126.461 us  |            } /* rcu_process_callbacks */
        0) ! 145.111 us  |          } /* __do_softirq */
        0) ! 149.667 us  |        } /* do_softirq */
        0) ! 168.817 us  |      } /* irq_exit */
        0) ! 248.254 us  |    } /* smp_apic_timer_interrupt */
      
      Also, add support for specifying tracing_thresh on the kernel
      command line.  When used like so: "tracing_thresh=200 ftrace=function_graph"
      this can be used to analyse system startup.  It is important to disable
      tracing soon after boot, in order to avoid losing the trace data.
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NTim Bird <tim.bird@am.sony.com>
      LKML-Reference: <4B87098B.4040308@am.sony.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      0e950173
    • A
      tracing: Update the comm field in the right variable in update_max_tr · 1acaa1b2
      Arnaldo Carvalho de Melo 提交于
      The latency output showed:
      
       #    | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The comm is missing in the "task:" and it looks like a minus 3 is
      the output. The correct display should be:
      
       #    | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)
      
      The problem is that the comm is being stored in the wrong data
      structure. The max_tr.data[cpu] is what stores the comm, not the
      tr->data[cpu].
      
      Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
      ended up showing just the '-' sign followed by the pid.
      
      Also remove a needless initialization of max_data.
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      1acaa1b2
    • S
      function-graph: Use comment notation for func names of dangling '}' · a094fe04
      Steven Rostedt 提交于
      When a '}' does not have a matching function start, the name is printed
      within parenthesis. But this makes it confusing between ending '}'
      and function starts. This patch makes the function name appear in C comment
      notation.
      
      Old view:
       3)   1.281 us    |            } (might_fault)
       3)   3.620 us    |          } (filldir)
       3)   5.251 us    |        } (call_filldir)
       3)               |        call_filldir() {
       3)               |          filldir() {
      
      New view:
       3)   1.281 us    |            } /* might_fault */
       3)   3.620 us    |          } /* filldir */
       3)   5.251 us    |        } /* call_filldir */
       3)               |        call_filldir() {
       3)               |          filldir() {
      Requested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a094fe04
    • S
      function-graph: Fix unused reference to ftrace_set_func() · 801c29fd
      Steven Rostedt 提交于
      The declaration of ftrace_set_func() is at the start of the ftrace.c file
      and wrapped with a #ifdef CONFIG_FUNCTION_GRAPH condition. If function
      graph tracing is enabled but CONFIG_DYNAMIC_FTRACE is not, a warning
      about that function being declared static and unused is given.
      
      This really should have been placed within the CONFIG_FUNCTION_GRAPH
      condition that uses ftrace_set_func().
      
      Moving the declaration down fixes the warning and makes the code cleaner.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      801c29fd
  4. 04 3月, 2010 1 次提交
  5. 03 3月, 2010 5 次提交
    • L
      tracing: Fix warning in s_next of trace file ops · ac91d854
      Lai Jiangshan 提交于
      This warning in s_next() can be triggered by lseek():
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3c1>] warn_slowpath_common+0x81/0xa0
       [<c018b3f7>] ? s_next+0x77/0x80
       [<c013e3fa>] warn_slowpath_null+0x1a/0x20
       [<c018b3f7>] s_next+0x77/0x80
       [<c01efa77>] traverse+0x117/0x200
       [<c01eff13>] seq_lseek+0xa3/0x120
       [<c01efe70>] ? seq_lseek+0x0/0x120
       [<c01d7081>] vfs_llseek+0x41/0x50
       [<c01d8116>] sys_llseek+0x66/0xa0
       [<c0102bd0>] sysenter_do_call+0x12/0x26
      
      The iterator "leftover" variable is zeroed in the opening of the trace
      file. But lseek can call s_start() which will call s_next() without
      reseting the "leftover" variable back to zero, which might trigger
      the WARN_ON_ONCE(iter->leftover) that is in s_next().
      
      Cc: stable@kernel.org
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ac91d854
    • L
      Merge branch 'for-linus' of git://gitorious.org/linux-omap-dss2/linux · 3a5b27bf
      Linus Torvalds 提交于
      * 'for-linus' of git://gitorious.org/linux-omap-dss2/linux: (49 commits)
        OMAP: DSS2: Taal: Fix TE when resuming
        OMAP: DSS2: Taal: Fix ESD check
        OMAP: DSS2: OMAPFB: Constify some function parameters
        OMAP: DSS2: OMAPFB: install omapfb.h
        OMAP: DSS2: DSI: add error prints
        OMAP: DSS2: TPO-TD03MTEA1: fix function names
        OMAP: DSS2: DSI: add dsi_vc_dcs_read_2() helper
        OMAP: DSS2: OMAPFB: Remove FB_OMAP2_FORCE_AUTO_UPDATE
        OMAP: DSS2: DSI: remove external TE support
        OMAP: DSS2: move timing functions
        OMAP: DSS2: move set/get_wss()
        OMAP: DSS2: move enable/disable/suspend/resume
        OMAP: DSS2: move update() and sync()
        OMAP: DSS2: move set/get_update_mode()
        OMAP: DSS2: move enable/get_te()
        OMAP: DSS2: move get_recommended_bpp()
        OMAP: DSS2: move get_resolution()
        OMAP: DSS2: move enable/disable_channel to overlay manager
        OMAP: DSS2: move wait_vsync()
        OMAP: DSS2: move get/set_rotate()
        ...
      3a5b27bf
    • L
      Merge branch 'omap-for-linus' of... · 5057bfaf
      Linus Torvalds 提交于
      Merge branch 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6
      
      * 'omap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap-2.6: (214 commits)
        omap2: Initialize Menelaus and MMC for N8X0
        AM3517 EVM: correct typo - tca6416 mispelt as tca6516
        AM3517 EVM: Enable I2C support
        AM35x: Enable OMAP_MUX in defconfig
        AM35x: Add missing GPIO mux config for EHCI port
        Zoom3: Defconfig update
        omap: i2c: Fix muxing for command line enabled bus
        OMAP4: clock: Remove clock hacks from timer-gp.c
        OMAP4: clock: Add dummy clock nodes for interface clocks
        OMAP4: clock: Rename leaf clock nodes to end with a _ick or _fck
        OMAP2+ clock: revise omap2_clk_{disable,enable}()
        OMAP2/3 clock: combine OMAP2 & 3 boot-time MPU rate change code
        OMAP clockdomain: if no autodeps exist, don't try to add or remove them
        OMAP hwmod: add hwmod class support
        OMAP hwmod: convert header files with static allocations into C files
        OMAP hwmod: convert hwmod to use hardware clock names rather than clkdev dev+con
        OMAP clock: add omap_clk_get_by_name() for use by OMAP hwmod core code
        OMAP3: clock: add capability to change rate of dpll4_m5_ck_3630
        OMAP4 clock: drop the ALWAYS_ENABLED clock flag
        OMAP clock: drop RATE_FIXED clock flag
        ...
      5057bfaf
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 6c0ad5df
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        Revert "blkdev: fix merge_bvec_fn return value checks"
      6c0ad5df
    • J
      Revert "blkdev: fix merge_bvec_fn return value checks" · 9599945b
      Jens Axboe 提交于
      This reverts commit 9f7cdbc3.
      
      It's causing oopses om dm setups, so revert it until we investigate.
      Reported-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Tested-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      9599945b
  6. 02 3月, 2010 24 次提交