1. 07 3月, 2009 1 次提交
  2. 06 3月, 2009 1 次提交
  3. 05 3月, 2009 5 次提交
    • S
      tracing: add tracing_on/tracing_off to kernel.h · 2002c258
      Steven Rostedt 提交于
      Impact: cleanup
      
      The functions tracing_start/tracing_stop have been moved to kernel.h.
      These are not the functions a developer most likely wants to use
      when they want to insert a place to stop tracing and restart it from
      user space.
      
      tracing_start/tracing_stop was created to work with things like
      suspend to ram, where even calling smp_processor_id() can crash the
      system. The tracing_start/tracing_stop was used to stop the tracer from
      doing anything. These are still light weight functions, but add a bit
      more overhead to be able to stop the tracers. They also have no interface
      back to userland. That is, if the kernel calls tracing_stop, userland
      can not start tracing.
      
      What a developer most likely wants to use is tracing_on/tracing_off.
      These are very light weight functions (simply sets or clears a bit).
      These functions just stop recording into the ring buffer. The tracers
      don't even know that this happens except that they would receive NULL
      from the ring_buffer_lock_reserve function.
      
      Also, there's a way for the user land to enable or disable this bit.
      In debugfs/tracing/tracing_on, a user may echo "0" (same as tracing_off())
      or echo "1" (same as tracing_on()) into this file. This becomes handy when
      a kernel developer is debugging and wants tracing to turn off when it
      hits an anomaly. Then the developer can examine the trace, and restart
      tracing if they want to try again (echo 1 > tracing_on).
      
      This patch moves the prototypes for tracing_on/tracing_off to kernel.h
      and comments their use, so that a kernel developer will know how
      to use them.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2002c258
    • F
      tracing/function-graph-tracer: use the more lightweight local clock · 0012693a
      Frederic Weisbecker 提交于
      Impact: decrease hangs risks with the graph tracer on slow systems
      
      Since the function graph tracer can spend too much time on timer
      interrupts, it's better now to use the more lightweight local
      clock. Anyway, the function graph traces are more reliable on a
      per cpu trace.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <49af243d.06e9300a.53ad.ffff840c@mx.google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0012693a
    • I
      tracing: move utility functions from ftrace.h to kernel.h · 526211bc
      Ingo Molnar 提交于
      Make common utility functions such as trace_printk() and
      tracing_start()/tracing_stop() generally available to kernel
      code.
      
      Cc: Steven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      526211bc
    • I
      tracing: rename ftrace_printk() => trace_printk() · 5e1607a0
      Ingo Molnar 提交于
      Impact: cleanup
      
      Use a more generic name - this also allows the prototype to move
      to kernel.h and be generally available to kernel developers who
      want to do some quick tracing.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5e1607a0
    • P
      tracing: add lockdep tracepoints for lock acquire/release · efed792d
      Peter Zijlstra 提交于
      Augment the traces with lock names when lockdep is available:
      
       1)               |  down_read_trylock() {
       1)               |    _spin_lock_irqsave() {
       1)               |      /* lock_acquire: &sem->wait_lock */
       1)   4.201 us    |    }
       1)               |    _spin_unlock_irqrestore() {
       1)               |      /* lock_release: &sem->wait_lock */
       1)   3.523 us    |    }
       1)               |  /* lock_acquire: try read &mm->mmap_sem */
       1) + 13.386 us   |  }
       1)   1.635 us    |  find_vma();
       1)               |  handle_mm_fault() {
       1)               |    __do_fault() {
       1)               |      filemap_fault() {
       1)               |        find_lock_page() {
       1)               |          find_get_page() {
       1)               |            /* lock_acquire: read rcu_read_lock */
       1)               |            /* lock_release: rcu_read_lock */
       1)   5.697 us    |          }
       1)   8.158 us    |        }
       1) + 11.079 us   |      }
       1)               |      _spin_lock() {
       1)               |        /* lock_acquire: __pte_lockptr(page) */
       1)   3.949 us    |      }
       1)   1.460 us    |      page_add_file_rmap();
       1)               |      _spin_unlock() {
       1)               |        /* lock_release: __pte_lockptr(page) */
       1)   3.115 us    |      }
       1)               |      unlock_page() {
       1)   1.421 us    |        page_waitqueue();
       1)   1.220 us    |        __wake_up_bit();
       1)   6.519 us    |      }
       1) + 34.328 us   |    }
       1) + 37.452 us   |  }
       1)               |  up_read() {
       1)               |  /* lock_release: &mm->mmap_sem */
       1)               |    _spin_lock_irqsave() {
       1)               |      /* lock_acquire: &sem->wait_lock */
       1)   3.865 us    |    }
       1)               |    _spin_unlock_irqrestore() {
       1)               |      /* lock_release: &sem->wait_lock */
       1)   8.562 us    |    }
       1) + 17.370 us   |  }
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <1236166375.5330.7209.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      efed792d
  4. 04 3月, 2009 1 次提交
  5. 03 3月, 2009 1 次提交
  6. 02 3月, 2009 2 次提交
    • S
      tracing: add TRACE_FIELD_SPECIAL to record complex entries · d20e3b03
      Steven Rostedt 提交于
      Tom Zanussi pointed out that the simple TRACE_FIELD was not enough to
      record trace data that required memcpy. This patch addresses this issue
      by adding a TRACE_FIELD_SPECIAL. The format is similar to TRACE_FIELD
      but looks like so:
      
        TRACE_FIELD_SPECIAL(type_item, item, cmd)
      
      What TRACE_FIELD gave was:
      
        TRACE_FIELD(type, item, assign)
      
      The TRACE_FIELD would be used in declaring a structure:
      
        struct {
      	type	item;
        };
      
      And later assign it via:
      
        entry->item = assign;
      
      What TRACE_FIELD_SPECIAL gives us is:
      
      In the declaration of the structure:
      
        struct {
      	type_item;
        };
      
      And the assignment:
      
        cmd;
      
      This change log will explain the one example used in the patch:
      
       TRACE_EVENT_FORMAT(sched_switch,
      	TPPROTO(struct rq *rq, struct task_struct *prev,
      		struct task_struct *next),
      	TPARGS(rq, prev, next),
      	TPFMT("task %s:%d ==> %s:%d",
      	      prev->comm, prev->pid, next->comm, next->pid),
      	TRACE_STRUCT(
      		TRACE_FIELD(pid_t, prev_pid, prev->pid)
      		TRACE_FIELD(int, prev_prio, prev->prio)
      		TRACE_FIELD_SPECIAL(char next_comm[TASK_COMM_LEN],
      				    next_comm,
      				    TPCMD(memcpy(TRACE_ENTRY->next_comm,
      						 next->comm,
      						 TASK_COMM_LEN)))
      		TRACE_FIELD(pid_t, next_pid, next->pid)
      		TRACE_FIELD(int, next_prio, next->prio)
      	),
      	TPRAWFMT("prev %d:%d ==> next %s:%d:%d")
      	);
      
       The struct will be create as:
      
        struct {
      	pid_t		prev_pid;
      	int		prev_prio;
      	char next_comm[TASK_COMM_LEN];
      	pid_t		next_pid;
      	int		next_prio;
        };
      
      Note the TRACE_ENTRY in the cmd part of TRACE_SPECIAL. TRACE_ENTRY will
      be set by the tracer to point to the structure inside the trace buffer.
      
        entry->prev_pid	= prev->pid;
        entry->prev_prio	= prev->prio;
        memcpy(entry->next_comm, next->comm, TASK_COMM_LEN);
        entry->next_pid	= next->pid;
        entry->next_prio	= next->prio
      Reported-by: NTom Zanussi <tzanussi@gmail.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      d20e3b03
    • P
      5ce04e3d
  7. 01 3月, 2009 2 次提交
  8. 28 2月, 2009 6 次提交
  9. 27 2月, 2009 4 次提交
  10. 26 2月, 2009 5 次提交
    • J
      block: reduce stack footprint of blk_recount_segments() · 1e428079
      Jens Axboe 提交于
      blk_recalc_rq_segments() requires a request structure passed in, which
      we don't have from blk_recount_segments(). So the latter allocates one on
      the stack, using > 400 bytes of stack for that. This can cause us to spill
      over one page of stack from ext4 at least:
      
       0)     4560     400   blk_recount_segments+0x43/0x62
       1)     4160      32   bio_phys_segments+0x1c/0x24
       2)     4128      32   blk_rq_bio_prep+0x2a/0xf9
       3)     4096      32   init_request_from_bio+0xf9/0xfe
       4)     4064     112   __make_request+0x33c/0x3f6
       5)     3952     144   generic_make_request+0x2d1/0x321
       6)     3808      64   submit_bio+0xb9/0xc3
       7)     3744      48   submit_bh+0xea/0x10e
       8)     3696     368   ext4_mb_init_cache+0x257/0xa6a [ext4]
       9)     3328     288   ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
      10)     3040     160   ext4_mb_new_blocks+0x211/0x4b4 [ext4]
      11)     2880     336   ext4_ext_get_blocks+0xb61/0xd45 [ext4]
      12)     2544      96   ext4_get_blocks_wrap+0xf2/0x200 [ext4]
      13)     2448      80   ext4_da_get_block_write+0x6e/0x16b [ext4]
      14)     2368     352   mpage_da_map_blocks+0x7e/0x4b3 [ext4]
      15)     2016     352   ext4_da_writepages+0x2ce/0x43c [ext4]
      16)     1664      32   do_writepages+0x2d/0x3c
      17)     1632     144   __writeback_single_inode+0x162/0x2cd
      18)     1488      96   generic_sync_sb_inodes+0x1e3/0x32b
      19)     1392      16   sync_sb_inodes+0xe/0x10
      20)     1376      48   writeback_inodes+0x69/0xb3
      21)     1328     208   balance_dirty_pages_ratelimited_nr+0x187/0x2f9
      22)     1120     224   generic_file_buffered_write+0x1d4/0x2c4
      23)      896     176   __generic_file_aio_write_nolock+0x35f/0x393
      24)      720      80   generic_file_aio_write+0x6c/0xc8
      25)      640      80   ext4_file_write+0xa9/0x137 [ext4]
      26)      560     320   do_sync_write+0xf0/0x137
      27)      240      48   vfs_write+0xb3/0x13c
      28)      192      64   sys_write+0x4c/0x74
      29)      128     128   system_call_fastpath+0x16/0x1b
      
      Split the segment counting out into a __blk_recalc_rq_segments() helper
      to avoid allocating an onstack request just for checking the physical
      segment count.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1e428079
    • P
      rcu: Teach RCU that idle task is not quiscent state at boot · a6826048
      Paul E. McKenney 提交于
      This patch fixes a bug located by Vegard Nossum with the aid of
      kmemcheck, updated based on review comments from Nick Piggin,
      Ingo Molnar, and Andrew Morton.  And cleans up the variable-name
      and function-name language.  ;-)
      
      The boot CPU runs in the context of its idle thread during boot-up.
      During this time, idle_cpu(0) will always return nonzero, which will
      fool Classic and Hierarchical RCU into deciding that a large chunk of
      the boot-up sequence is a big long quiescent state.  This in turn causes
      RCU to prematurely end grace periods during this time.
      
      This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks()
      function to ignore the idle task as a quiescent state until the
      system has started up the scheduler in rest_init(), introducing a
      new non-API function rcu_idle_now_means_idle() to inform RCU of this
      transition.  RCU maintains an internal rcu_idle_cpu_truthful variable
      to track this state, which is then used by rcu_check_callback() to
      determine if it should believe idle_cpu().
      
      Because this patch has the effect of disallowing RCU grace periods
      during long stretches of the boot-up sequence, this patch also introduces
      Josh Triplett's UP-only optimization that makes synchronize_rcu() be a
      no-op if num_online_cpus() returns 1.  This allows boot-time code that
      calls synchronize_rcu() to proceed normally.  Note, however, that RCU
      callbacks registered by call_rcu() will likely queue up until later in
      the boot sequence.  Although rcuclassic and rcutree can also use this
      same optimization after boot completes, rcupreempt must restrict its
      use of this optimization to the portion of the boot sequence before the
      scheduler starts up, given that an rcupreempt RCU read-side critical
      section may be preeempted.
      
      In addition, this patch takes Nick Piggin's suggestion to make the
      system_state global variable be __read_mostly.
      
      Changes since v4:
      
      o	Changes the name of the introduced function and variable to
      	be less emotional.  ;-)
      
      Changes since v3:
      
      o	WARN_ON(nr_context_switches() > 0) to verify that RCU
      	switches out of boot-time mode before the first context
      	switch, as suggested by Nick Piggin.
      
      Changes since v2:
      
      o	Created rcu_blocking_is_gp() internal-to-RCU API that
      	determines whether a call to synchronize_rcu() is itself
      	a grace period.
      
      o	The definition of rcu_blocking_is_gp() for rcuclassic and
      	rcutree checks to see if but a single CPU is online.
      
      o	The definition of rcu_blocking_is_gp() for rcupreempt
      	checks to see both if but a single CPU is online and if
      	the system is still in early boot.
      
      	This allows rcupreempt to again work correctly if running
      	on a single CPU after booting is complete.
      
      o	Added check to rcupreempt's synchronize_sched() for there
      	being but one online CPU.
      
      Tested all three variants both SMP and !SMP, booted fine, passed a short
      rcutorture test on both x86 and Power.
      Located-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NVegard Nossum <vegard.nossum@gmail.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a6826048
    • S
      tracing: wrap arguments with PARAMS · 3cdfdf91
      Steven Rostedt 提交于
      Peter Zijlstra warned that TPPROTO and TPARGS might become something
      other than a simple copy of itself. To prevent this from having
      side effects in the TRACE_FORMAT macro in tracepoint.h, we add a
      PARAMS() macro to be defined as just a wrapper.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      3cdfdf91
    • S
      tracing: rename DEFINE_TRACE_FMT to just TRACE_FORMAT · eef62a68
      Steven Rostedt 提交于
      There's been a bit confusion to whether DEFINE/DECLARE_TRACE_FMT should
      be a DEFINE or a DECLARE. Ingo Molnar suggested simply calling it
      TRACE_FORMAT.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      eef62a68
    • B
      ide: fix refcounting in device drivers · 8fed4368
      Bartlomiej Zolnierkiewicz 提交于
      During host driver module removal del_gendisk() results in a final
      put on drive->gendev and freeing the drive by drive_release_dev().
      
      Convert device drivers from using struct kref to use struct device
      so device driver's object holds reference on ->gendev and prevents
      drive from prematurely going away.
      
      Also fix ->remove methods to not erroneously drop reference on a
      host driver by using only put_device() instead of ide*_put().
      Reported-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Tested-by: NStanislaw Gruszka <stf_xl@wp.pl>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      8fed4368
  11. 25 2月, 2009 7 次提交
  12. 23 2月, 2009 1 次提交
  13. 22 2月, 2009 1 次提交
  14. 21 2月, 2009 3 次提交
    • M
      8250: fix boot hang with serial console when using with Serial Over Lan port · b6adea33
      Mauro Carvalho Chehab 提交于
      Intel 8257x Ethernet boards have a feature called Serial Over Lan.
      
      This feature works by emulating a serial port, and it is detected by
      kernel as a normal 8250 port.  However, this emulation is not perfect, as
      also noticed on changeset 7500b1f6.
      
      Before this patch, the kernel were trying to check if the serial TX is
      capable of work using IRQ's.
      
      This were done with a code similar this:
      
              serial_outp(up, UART_IER, UART_IER_THRI);
              lsr = serial_in(up, UART_LSR);
              iir = serial_in(up, UART_IIR);
              serial_outp(up, UART_IER, 0);
      
              if (lsr & UART_LSR_TEMT && iir & UART_IIR_NO_INT)
      		up->bugs |= UART_BUG_TXEN;
      
      This works fine for other 8250 ports, but, on 8250-emulated SoL port, the
      chip is a little lazy to down UART_IIR_NO_INT at UART_IIR register.
      
      Due to that, UART_BUG_TXEN is sometimes enabled.  However, as TX IRQ keeps
      working, and the TX polling is now enabled, the driver miss-interprets the
      IRQ received later, hanging up the machine until a key is pressed at the
      serial console.
      
      This is the 6 version of this patch.  Previous versions were trying to
      introduce a large enough delay between serial_outp and serial_in(up,
      UART_IIR), but not taking forever.  However, the needed delay couldn't be
      safely determined.
      
      At the experimental tests, a delay of 1us solves most of the cases, but
      still hangs sometimes.  Increasing the delay to 5us was better, but still
      doesn't solve.  A very high delay of 50 ms seemed to work every time.
      
      However, poking around with delays and pray for it to be enough doesn't
      seem to be a good approach, even for a quirk.
      
      So, instead of playing with random large arbitrary delays, let's just
      disable UART_BUG_TXEN for all SoL ports.
      
      [akpm@linux-foundation.org: fix warnings]
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6adea33
    • M
      spi_bitbang: add more lowlevel function documentation · 01b24fee
      Michael Buesch 提交于
      This adds more documentation of the lowlevel API to avoid future bugs.
      Signed-off-by: NMichael Buesch <mb@bu3sch.de>
      Acked-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01b24fee
    • J
      slab: introduce kzfree() · 3ef0e5ba
      Johannes Weiner 提交于
      kzfree() is a wrapper for kfree() that additionally zeroes the underlying
      memory before releasing it to the slab allocator.
      
      Currently there is code which memset()s the memory region of an object
      before releasing it back to the slab allocator to make sure
      security-sensitive data are really zeroed out after use.
      
      These callsites can then just use kzfree() which saves some code, makes
      users greppable and allows for a stupid destructor that isn't necessarily
      aware of the actual object size.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Matt Mackall <mpm@selenic.com>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ef0e5ba