1. 10 3月, 2009 1 次提交
  2. 09 3月, 2009 2 次提交
    • I
      tracing: optimize trace_printk() · 7bffc23e
      Ingo Molnar 提交于
      Impact: micro-optimization
      
      trace_printk() does this unconditionally:
      
      	trace_printk_fmt = fmt;
      
      Where trace_printk_fmt is an entry into a global array. This is
      very SMP-unfriendly.
      
      So only write it once per bootup.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7bffc23e
    • I
      tracing: trace_printk() fix, move format array to data section · 8a20d84d
      Ingo Molnar 提交于
      Impact: fix kernel crash when using trace_printk()
      
      trace_printk_fmt section is defined into the readonly section.
      But we do:
      
      	trace_printk_fmt = fmt;
      
      to fill in that table of format strings - which is not read-only.
      Under CONFIG_DEBUG_RODATA=y this crashes ...
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8a20d84d
  3. 07 3月, 2009 4 次提交
    • F
      tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() · 769b0441
      Frederic Weisbecker 提交于
      Impact: faster and lighter tracing
      
      Now that we have trace_bprintk() which is faster and consume lesser
      memory than trace_printk() and has the same purpose, we can now drop
      the old implementation in favour of the binary one from trace_bprintk(),
      which means we move all the implementation of trace_bprintk() to
      trace_printk(), so the Api doesn't change except that we must now use
      trace_seq_bprintk() to print the TRACE_PRINT entries.
      
      Some changes result of this:
      
      - Previously, trace_bprintk depended of a single tracer and couldn't
        work without. This tracer has been dropped and the whole implementation
        of trace_printk() (like the module formats management) is now integrated
        in the tracing core (comes with CONFIG_TRACING), though we keep the file
        trace_printk (previously trace_bprintk.c) where we can find the module
        management. Thus we don't overflow trace.c
      
      - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.
      
      - change a bit trace_printk/trace_vprintk macros to support non-builtin formats
        constants, and fix 'const' qualifiers warnings. But this is all transparent for
        developers.
      
      - etc...
      
      V2:
      
      - Rebase against last changes
      - Fix mispell on the changelog
      
      V3:
      
      - Rebase against last changes (moving trace_printk() to kernel.h)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      769b0441
    • L
      tracing: add trace_bprintk() · 1ba28e02
      Lai Jiangshan 提交于
      Impact: add a generic printk() for tracing, like trace_printk()
      
      trace_bprintk() uses the infrastructure to record events on ring_buffer.
      
      [ fweisbec@gmail.com: ported to latest -tip, made it work if
        !CONFIG_MODULES, never free the format strings from modules
        because we can't keep track of them and conditionnaly create
        the ftrace format strings section (reported by Steven Rostedt) ]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1ba28e02
    • L
      tracing: infrastructure for supporting binary record · 1427cdf0
      Lai Jiangshan 提交于
      Impact: save on memory for tracing
      
      Current tracers are typically using a struct(like struct ftrace_entry,
      struct ctx_switch_entry, struct special_entr etc...)to record a binary
      event. These structs can only record a their own kind of events.
      A new kind of tracer need a new struct and a lot of code too handle it.
      
      So we need a generic binary record for events. This infrastructure
      is for this purpose.
      
      [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
      tracing as reported by Steven Rostedt]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1427cdf0
    • L
      vsprintf: add binary printf · 4370aa4a
      Lai Jiangshan 提交于
      Impact: add new APIs for binary trace printk infrastructure
      
      vbin_printf(): write args to binary buffer, string is copied
      when "%s" is occurred.
      
      bstr_printf(): read from binary buffer for args and format a string
      
      [fweisbec@gmail.com: rebase]
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1236356510-8381-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4370aa4a
  4. 06 3月, 2009 1 次提交
  5. 05 3月, 2009 5 次提交
    • S
      tracing: add tracing_on/tracing_off to kernel.h · 2002c258
      Steven Rostedt 提交于
      Impact: cleanup
      
      The functions tracing_start/tracing_stop have been moved to kernel.h.
      These are not the functions a developer most likely wants to use
      when they want to insert a place to stop tracing and restart it from
      user space.
      
      tracing_start/tracing_stop was created to work with things like
      suspend to ram, where even calling smp_processor_id() can crash the
      system. The tracing_start/tracing_stop was used to stop the tracer from
      doing anything. These are still light weight functions, but add a bit
      more overhead to be able to stop the tracers. They also have no interface
      back to userland. That is, if the kernel calls tracing_stop, userland
      can not start tracing.
      
      What a developer most likely wants to use is tracing_on/tracing_off.
      These are very light weight functions (simply sets or clears a bit).
      These functions just stop recording into the ring buffer. The tracers
      don't even know that this happens except that they would receive NULL
      from the ring_buffer_lock_reserve function.
      
      Also, there's a way for the user land to enable or disable this bit.
      In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
      or echo "1" (same as tracing_on()) into this file. This becomes handy when
      a kernel developer is debugging and wants tracing to turn off when it
      hits an anomaly. Then the developer can examine the trace, and restart
      tracing if they want to try again (echo 1 > tracing_on).
      
      This patch moves the prototypes for tracing_on/tracing_off to kernel.h
      and comments their use, so that a kernel developer will know how
      to use them.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2002c258
    • F
      tracing/function-graph-tracer: use the more lightweight local clock · 0012693a
      Frederic Weisbecker 提交于
      Impact: decrease hangs risks with the graph tracer on slow systems
      
      Since the function graph tracer can spend too much time on timer
      interrupts, it's better now to use the more lightweight local
      clock. Anyway, the function graph traces are more reliable on a
      per cpu trace.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0012693a
    • I
      tracing: move utility functions from ftrace.h to kernel.h · 526211bc
      Ingo Molnar 提交于
      Make common utility functions such as trace_printk() and
      tracing_start()/tracing_stop() generally available to kernel
      code.
      
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      526211bc
    • I
      tracing: rename ftrace_printk() => trace_printk() · 5e1607a0
      Ingo Molnar 提交于
      Impact: cleanup
      
      Use a more generic name - this also allows the prototype to move
      to kernel.h and be generally available to kernel developers who
      want to do some quick tracing.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e1607a0
    • P
      tracing: add lockdep tracepoints for lock acquire/release · efed792d
      Peter Zijlstra 提交于
      Augment the traces with lock names when lockdep is available:
      
       1)               |  down_read_trylock() {
       1)               |    _spin_lock_irqsave() {
       1)               |      /* lock_acquire: &sem->wait_lock */
       1)   4.201 us    |    }
       1)               |    _spin_unlock_irqrestore() {
       1)               |      /* lock_release: &sem->wait_lock */
       1)   3.523 us    |    }
       1)               |  /* lock_acquire: try read &mm->mmap_sem */
       1) + 13.386 us   |  }
       1)   1.635 us    |  find_vma();
       1)               |  handle_mm_fault() {
       1)               |    __do_fault() {
       1)               |      filemap_fault() {
       1)               |        find_lock_page() {
       1)               |          find_get_page() {
       1)               |            /* lock_acquire: read rcu_read_lock */
       1)               |            /* lock_release: rcu_read_lock */
       1)   5.697 us    |          }
       1)   8.158 us    |        }
       1) + 11.079 us   |      }
       1)               |      _spin_lock() {
       1)               |        /* lock_acquire: __pte_lockptr(page) */
       1)   3.949 us    |      }
       1)   1.460 us    |      page_add_file_rmap();
       1)               |      _spin_unlock() {
       1)               |        /* lock_release: __pte_lockptr(page) */
       1)   3.115 us    |      }
       1)               |      unlock_page() {
       1)   1.421 us    |        page_waitqueue();
       1)   1.220 us    |        __wake_up_bit();
       1)   6.519 us    |      }
       1) + 34.328 us   |    }
       1) + 37.452 us   |  }
       1)               |  up_read() {
       1)               |  /* lock_release: &mm->mmap_sem */
       1)               |    _spin_lock_irqsave() {
       1)               |      /* lock_acquire: &sem->wait_lock */
       1)   3.865 us    |    }
       1)               |    _spin_unlock_irqrestore() {
       1)               |      /* lock_release: &sem->wait_lock */
       1)   8.562 us    |    }
       1) + 17.370 us   |  }
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1236166375.5330.7209.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      efed792d
  6. 04 3月, 2009 1 次提交
  7. 03 3月, 2009 1 次提交
  8. 02 3月, 2009 2 次提交
    • S
      tracing: add TRACE_FIELD_SPECIAL to record complex entries · d20e3b03
      Steven Rostedt 提交于
      Tom Zanussi pointed out that the simple TRACE_FIELD was not enough to
      record trace data that required memcpy. This patch addresses this issue
      by adding a TRACE_FIELD_SPECIAL. The format is similar to TRACE_FIELD
      but looks like so:
      
        TRACE_FIELD_SPECIAL(type_item, item, cmd)
      
      What TRACE_FIELD gave was:
      
        TRACE_FIELD(type, item, assign)
      
      The TRACE_FIELD would be used in declaring a structure:
      
        struct {
      	type	item;
        };
      
      And later assign it via:
      
        entry->item = assign;
      
      What TRACE_FIELD_SPECIAL gives us is:
      
      In the declaration of the structure:
      
        struct {
      	type_item;
        };
      
      And the assignment:
      
        cmd;
      
      This change log will explain the one example used in the patch:
      
       TRACE_EVENT_FORMAT(sched_switch,
      	TPPROTO(struct rq *rq, struct task_struct *prev,
      		struct task_struct *next),
      	TPARGS(rq, prev, next),
      	TPFMT("task %s:%d ==> %s:%d",
      	      prev->comm, prev->pid, next->comm, next->pid),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, prev_pid, prev->pid)
      		TRACE_FIELD(int, prev_prio, prev->prio)
      		TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
      				    next_comm,
      				    TPCMD(memcpy(TRACE_ENTRY->next_comm,
      						 next->comm,
      						 TASK_COMM_LEN)))
      		TRACE_FIELD(pid_t, next_pid, next->pid)
      		TRACE_FIELD(int, next_prio, next->prio)
      	),
      	TPRAWFMT("prev %d:%d ==> next %s:%d:%d")
      	);
      
       The struct will be create as:
      
        struct {
      	pid_t		prev_pid;
      	int		prev_prio;
      	char next_comm[TASK_COMM_LEN];
      	pid_t		next_pid;
      	int		next_prio;
        };
      
      Note the TRACE_ENTRY in the cmd part of TRACE_SPECIAL. TRACE_ENTRY will
      be set by the tracer to point to the structure inside the trace buffer.
      
        entry->prev_pid	= prev->pid;
        entry->prev_prio	= prev->prio;
        memcpy(entry->next_comm, next->comm, TASK_COMM_LEN);
        entry->next_pid	= next->pid;
        entry->next_prio	= next->prio
      Reported-by: NTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      d20e3b03
    • P
      5ce04e3d
  9. 01 3月, 2009 2 次提交
  10. 28 2月, 2009 6 次提交
  11. 27 2月, 2009 4 次提交
  12. 26 2月, 2009 5 次提交
    • J
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe 提交于
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1e428079
    • P
      rcu: Teach RCU that idle task is not quiscent state at boot · a6826048
      Paul E. McKenney 提交于
      This patch fixes a bug located by Vegard Nossum with the aid of
      kmemcheck, updated based on review comments from Nick Piggin,
      Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
      and function-name language.  ;-)
      
      The boot CPU runs in the context of its idle thread during boot-up.
      During this time, idle_cpu(0) will always return nonzero, which will
      fool Classic and Hierarchical RCU into deciding that a large chunk of
      the boot-up sequence is a big long quiescent state.  This in turn causes
      RCU to prematurely end grace periods during this time.
      
      This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
      function to ignore the idle task as a quiescent state until the
      system has started up the scheduler in rest_init(), introducing a
      new non-API function rcu_idle_now_means_idle() to inform RCU of this
      transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
      to track this state, which is then used by rcu_check_callback() to
      determine if it should believe idle_cpu().
      
      Because this patch has the effect of disallowing RCU grace periods
      during long stretches of the boot-up sequence, this patch also introduces
      Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
      no-op if num_online_cpus() returns 1.  This allows boot-time code that
      calls synchronize_rcu() to proceed normally.  Note, however, that RCU
      callbacks registered by call_rcu() will likely queue up until later in
      the boot sequence.  Although rcuclassic and rcutree can also use this
      same optimization after boot completes, rcupreempt must restrict its
      use of this optimization to the portion of the boot sequence before the
      scheduler starts up, given that an rcupreempt RCU read-side critical
      section may be preeempted.
      
      In addition, this patch takes Nick Piggin's suggestion to make the
      system_state global variable be __read_mostly.
      
      Changes since v4:
      
      o	Changes the name of the introduced function and variable to
      	be less emotional.  ;-)
      
      Changes since v3:
      
      o	WARN_ON(nr_context_switches() > 0) to verify that RCU
      	switches out of boot-time mode before the first context
      	switch, as suggested by Nick Piggin.
      
      Changes since v2:
      
      o	Created rcu_blocking_is_gp() internal-to-RCU API that
      	determines whether a call to synchronize_rcu() is itself
      	a grace period.
      
      o	The definition of rcu_blocking_is_gp() for rcuclassic and
      	rcutree checks to see if but a single CPU is online.
      
      o	The definition of rcu_blocking_is_gp() for rcupreempt
      	checks to see both if but a single CPU is online and if
      	the system is still in early boot.
      
      	This allows rcupreempt to again work correctly if running
      	on a single CPU after booting is complete.
      
      o	Added check to rcupreempt's synchronize_sched() for there
      	being but one online CPU.
      
      Tested all three variants both SMP and !SMP, booted fine, passed a short
      rcutorture test on both x86 and Power.
      Located-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6826048
    • S
      tracing: wrap arguments with PARAMS · 3cdfdf91
      Steven Rostedt 提交于
      Peter Zijlstra warned that TPPROTO and TPARGS might become something
      other than a simple copy of itself. To prevent this from having
      side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
      PARAMS() macro to be defined as just a wrapper.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3cdfdf91
    • S
      tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT · eef62a68
      Steven Rostedt 提交于
      There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
      be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
      TRACE_FORMAT.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      eef62a68
    • B
      ide: fix refcounting in device drivers · 8fed4368
      Bartlomiej Zolnierkiewicz 提交于
      During host driver module removal del_gendisk() results in a final
      put on drive->gendev and freeing the drive by drive_release_dev().
      
      Convert device drivers from using struct kref to use struct device
      so device driver's object holds reference on ->gendev and prevents
      drive from prematurely going away.
      
      Also fix ->remove methods to not erroneously drop reference on a
      host driver by using only put_device() instead of ide*_put().
      Reported-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Tested-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      8fed4368
  13. 25 2月, 2009 6 次提交