1. 18 2月, 2009 1 次提交
    • S
      ftrace: rename _hook to _probe · b6887d79
      Steven Rostedt 提交于
      Impact: clean up
      
      Ingo Molnar did not like the _hook naming convention used by the
      select function tracer. Luis Claudio R. Goncalves suggested using
      the "_probe" extension. This patch implements the change of
      calling the functions and variables "_hook" and replacing them
      with "_probe".
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      b6887d79
  2. 17 2月, 2009 5 次提交
    • I
      ftrace: fix !CONFIG_FTRACE [un_]register_ftrace_command() prototypes · 97d0bb8d
      Ingo Molnar 提交于
      Impact: build fix
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      97d0bb8d
    • S
      ftrace: add pretty print to selected fuction traces · 809dcf29
      Steven Rostedt 提交于
      This patch adds a call back for the tracers that have hooks to
      selected functions. This allows the tracer to show better output
      in the set_ftrace_filter file.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      809dcf29
    • S
      ring-buffer: add tracing_is_on to test if ring buffer is enabled · 988ae9d6
      Steven Rostedt 提交于
      This patch adds the tracing_is_on() interface to tell if the ring
      buffer is turned on or not.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      988ae9d6
    • S
      ftrace: trace different functions with a different tracer · 59df055f
      Steven Rostedt 提交于
      Impact: new feature
      
      Currently, the function tracer only gives you an ability to hook
      a tracer to all functions being traced. The dynamic function trace
      allows you to pick and choose which of those functions will be
      traced, but all functions being traced will call all tracers that
      registered with the function tracer.
      
      This patch adds a new feature that allows a tracer to hook to specific
      functions, even when all functions are being traced. It allows for
      different functions to call different tracer hooks.
      
      The way this is accomplished is by a special function that will hook
      to the function tracer and will set up a hash table knowing which
      tracer hook to call with which function. This is the most general
      and easiest method to accomplish this. Later, an arch may choose
      to supply their own method in changing the mcount call of a function
      to call a different tracer. But that will be an exercise for the
      future.
      
      To register a function:
      
       struct ftrace_hook_ops {
      	void			(*func)(unsigned long ip,
      					unsigned long parent_ip,
      					void **data);
      	int			(*callback)(unsigned long ip, void **data);
      	void			(*free)(void **data);
       };
      
       int register_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
      				  void *data);
      
      glob is a simple glob to search for the functions to hook.
      ops is a pointer to the operations (listed below)
      data is the default data to be passed to the hook functions when traced
      
      ops:
       func is the hook function to call when the functions are traced
       callback is a callback function that is called when setting up the hash.
         That is, if the tracer needs to do something special for each
         function, that is being traced, and wants to give each function
         its own data. The address of the entry data is passed to this
         callback, so that the callback may wish to update the entry to
         whatever it would like.
       free is a callback for when the entry is freed. In case the tracer
         allocated any data, it is give the chance to free it.
      
      To unregister we have three functions:
      
        void
        unregister_ftrace_function_hook(char *glob, struct ftrace_hook_ops *ops,
      				void *data)
      
      This will unregister all hooks that match glob, point to ops, and
      have its data matching data. (note, if glob is NULL, blank or '*',
      all functions will be tested).
      
        void
        unregister_ftrace_function_hook_func(char *glob,
      				 struct ftrace_hook_ops *ops)
      
      This will unregister all functions matching glob that has an entry
      pointing to ops.
      
        void unregister_ftrace_function_hook_all(char *glob)
      
      This simply unregisters all funcs.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      59df055f
    • S
      ftrace: add command interface for function selection · f6180773
      Steven Rostedt 提交于
      Allow for other tracers to add their own commands for function
      selection. This interface gives a trace the ability to name a
      command for function selection. Right now it is pretty limited
      in what it offers, but this is a building step for more features.
      
      The :mod: command is converted to this interface and also serves
      as a template for other implementations.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      f6180773
  3. 13 2月, 2009 2 次提交
    • S
      sched: do not account for NMIs · 2a7b8df0
      Steven Rostedt 提交于
      Impact: avoid corruption in system time accounting
      
      Martin Schwidefsky told me that there was an issue with NMIs and
      system accounting. The problem is that the accounting code is
      not reentrant, and if an NMI goes off after an interrupt it can
      corrupt the accounting.
      
      For now, the best we can do is to treat NMIs like SMIs and they
      are not accounted for.
      
      This patch changes nmi_enter to not call __irq_enter and to do
      the preempt-count and tracing calls directly.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      2a7b8df0
    • S
      preempt-count: force hardirq-count to max of 10 · 5a5fb7db
      Steven Rostedt 提交于
      To add a bit in the preempt_count to be set when in NMI context, we
      found that some archs did not have enough bits to spare. This is
      due to the hardirq_count being a mask that can hold NR_IRQS.
      
      Some archs allow for over 16000 IRQs, and that would require a mask
      of 14 bits. The sofitrq mask is 8 bits and the preempt disable mask
      is also 8 bits.  The PREEMP_ACTIVE bit is bit 30, and bit 31 would
      make the preempt_count (which is type int) a negative number.
      A negative preempt_count is a sign of failure.
      
      Add them up 14+8+8+1+1 you get 32 bits. No room for the NMI bit.
      
      But the hardirq_count is to track the number of nested IRQs, not
      the number of total IRQs.  This originally took the paranoid approach
      of setting the max nesting to NR_IRQS. But when we have archs with
      over 1000 IRQs, it is not practical to think they will ever all
      nest on a single CPU. Not to mention that this would most definitely
      cause a stack overflow.
      
      This patch sets a max of 10 bits to be used for IRQ nesting.
      I did a 'git grep HARDIRQ' to examine all users of HARDIRQ_BITS and
      HARDIRQ_MASK, and found that making it a max of 10 would not hurt
      anyone. I did find that the m68k expected it to be 8 bits, so
      I allow for the archs to set the number to be less than 10.
      
      I removed the setting of HARDIRQ_BITS from the archs that set it
      to more than 10. This includes ALPHA, ia64 and avr32.
      
      This will always allow room for the NMI bit, and if we need to allow
      for NMI nesting, we have 4 bits to play with.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      5a5fb7db
  4. 12 2月, 2009 2 次提交
    • H
      syscall define: fix uml compile bug · 6c597963
      Heiko Carstens 提交于
      With the new system call defines we get this on uml:
      
      arch/um/sys-i386/built-in.o: In function `sys_call_table':
      (.rodata+0x308): undefined reference to `sys_sigprocmask'
      
      Reason for this is that uml passes the preprocessor option
      -Dsigprocmask=kernel_sigprocmask to gcc when compiling the kernel.
      This causes SYSCALL_DEFINE3(sigprocmask, ...) to be expanded to
      SYSCALL_DEFINEx(3, kernel_sigprocmask, ...) and finally to a system
      call named sys_kernel_sigprocmask.  However sys_sigprocmask is missing
      because of this.
      
      To avoid macro expansion for the system call name just concatenate the
      name at first define instead of carrying it through severel levels.
      This was pointed out by Al Viro.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NWANG Cong <wangcong@zeuux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c597963
    • L
      cgroups: fix lockdep subclasses overflow · cfebe563
      Li Zefan 提交于
      I enabled all cgroup subsystems when compiling kernel, and then:
       # mount -t cgroup -o net_cls xxx /mnt
       # mkdir /mnt/0
      
      This showed up immediately:
       BUG: MAX_LOCKDEP_SUBCLASSES too low!
       turning off the locking correctness validator.
      
      It's caused by the cgroup hierarchy lock:
      	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
      		struct cgroup_subsys *ss = subsys[i];
      		if (ss->root == root)
      			mutex_lock_nested(&ss->hierarchy_mutex, i);
      	}
      
      Now we have 9 cgroup subsystems, and the above 'i' for net_cls is 8, but
      MAX_LOCKDEP_SUBCLASSES is 8.
      
      This patch uses different lockdep keys for different subsystems.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Acked-by: NPaul Menage <menage@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfebe563
  5. 11 2月, 2009 6 次提交
  6. 10 2月, 2009 2 次提交
  7. 09 2月, 2009 2 次提交
  8. 08 2月, 2009 6 次提交
    • W
      trace: trivial fixes in comment typos. · 57794a9d
      Wenji Huang 提交于
      Impact: clean up
      
      Fixed several typos in the comments.
      Signed-off-by: NWenji Huang <wenji.huang@oracle.com>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      57794a9d
    • S
      ring-buffer: use generic version of in_nmi · a81bd80a
      Steven Rostedt 提交于
      Impact: clean up
      
      Now that a generic in_nmi is available, this patch removes the
      special code in the ring_buffer and implements the in_nmi generic
      version instead.
      
      With this change, I was also able to rename the "arch_ftrace_nmi_enter"
      back to "ftrace_nmi_enter" and remove the code from the ring buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      a81bd80a
    • S
      nmi: add generic nmi tracking state · 375b38b4
      Steven Rostedt 提交于
      This code adds an in_nmi() macro that uses the current tasks preempt count
      to track when it is in NMI context. Other parts of the kernel can
      use this to determine if the context is in NMI context or not.
      
      This code was inspired by the -rt patch in_nmi version that was
      written by Peter Zijlstra, who borrowed that code from
      Mathieu Desnoyers.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      375b38b4
    • S
      ring-buffer: allow tracing_off to be used in core kernel code · d8b891a2
      Steven Rostedt 提交于
      tracing_off() is the fastest way to stop recording to the ring buffers.
      This may be used in places like panic and die, just before the
      ftrace_dump is called.
      
      This patch adds the appropriate CPP conditionals to make it a stub
      function when the ring buffer is not configured it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      d8b891a2
    • S
      ring-buffer: add NMI protection for spinlocks · 78d904b4
      Steven Rostedt 提交于
      Impact: prevent deadlock in NMI
      
      The ring buffers are not yet totally lockless with writing to
      the buffer. When a writer crosses a page, it grabs a per cpu spinlock
      to protect against a reader. The spinlocks taken by a writer are not
      to protect against other writers, since a writer can only write to
      its own per cpu buffer. The spinlocks protect against readers that
      can touch any cpu buffer. The writers are made to be reentrant
      with the spinlocks disabling interrupts.
      
      The problem arises when an NMI writes to the buffer, and that write
      crosses a page boundary. If it grabs a spinlock, it can be racing
      with another writer (since disabling interrupts does not protect
      against NMIs) or with a reader on the same CPU. Luckily, most of the
      users are not reentrant and protects against this issue. But if a
      user of the ring buffer becomes reentrant (which is what the ring
      buffers do allow), if the NMI also writes to the ring buffer then
      we risk the chance of a deadlock.
      
      This patch moves the ftrace_nmi_enter called by nmi_enter() to the
      ring buffer code. It replaces the current ftrace_nmi_enter that is
      used by arch specific code to arch_ftrace_nmi_enter and updates
      the Kconfig to handle it.
      
      When an NMI is called, it will set a per cpu variable in the ring buffer
      code and will clear it when the NMI exits. If a write to the ring buffer
      crosses page boundaries inside an NMI, a trylock is used on the spin
      lock instead. If the spinlock fails to be acquired, then the entry
      is discarded.
      
      This bug appeared in the ftrace work in the RT tree, where event tracing
      is reentrant. This workaround solved the deadlocks that appeared there.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      78d904b4
    • R
      module: remove over-zealous check in __module_get() · 7f9a50a5
      Rusty Russell 提交于
      Impact: fix spurious BUG_ON() triggered under load
      
      module_refcount() isn't reliable outside stop_machine(), as demonstrated
      by Karsten Keil <kkeil@suse.de>, networking can trigger it under load
      (an inc on one cpu and dec on another while module_refcount() is tallying
       can give false results, for example).
      
      Almost noone should be using __module_get, but that's another issue.
      
      Cc: Karsten Keil <kkeil@suse.de>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f9a50a5
  9. 07 2月, 2009 1 次提交
  10. 06 2月, 2009 5 次提交
    • I
      timers: split process wide cpu clocks/timers, remove spurious warning · 7d8e23df
      Ingo Molnar 提交于
      Mike Galbraith reported that the new warning in thread_group_cputimer()
      triggers en masse with Amarok running.
      
      Oleg Nesterov observed:
      
        Can't fastpath_timer_check()->thread_group_cputimer() have the
        false warning too? Suppose we had the timer, then posix_cpu_timer_del()
        removes this timer, but task_cputime_zero(&sig->cputime_expires) still
        not true.
      
      Remove the spurious debug warning.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Explained-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7d8e23df
    • A
      ring_buffer: remove unused flags parameter · 0a987751
      Arnaldo Carvalho de Melo 提交于
      Impact: API change, cleanup
      
      >From ring_buffer_{lock_reserve,unlock_commit}.
      
      $ codiff /tmp/vmlinux.before /tmp/vmlinux.after
      linux-2.6-tip/kernel/trace/trace.c:
        trace_vprintk              |  -14
        trace_graph_return         |  -14
        trace_graph_entry          |  -10
        trace_function             |   -8
        __ftrace_trace_stack       |   -8
        ftrace_trace_userstack     |   -8
        tracing_sched_switch_trace |   -8
        ftrace_trace_special       |  -12
        tracing_sched_wakeup_trace |   -8
       9 functions changed, 90 bytes removed, diff: -90
      
      linux-2.6-tip/block/blktrace.c:
        __blk_add_trace |   -1
       1 function changed, 1 bytes removed, diff: -1
      
      /tmp/vmlinux.after:
       10 functions changed, 91 bytes removed, diff: -91
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Acked-by: NFrédéric Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0a987751
    • J
      wait: prevent exclusive waiter starvation · 777c6c5f
      Johannes Weiner 提交于
      With exclusive waiters, every process woken up through the wait queue must
      ensure that the next waiter down the line is woken when it has finished.
      
      Interruptible waiters don't do that when aborting due to a signal.  And if
      an aborting waiter is concurrently woken up through the waitqueue, noone
      will ever wake up the next waiter.
      
      This has been observed with __wait_on_bit_lock() used by
      lock_page_killable(): the first contender on the queue was aborting when
      the actual lock holder woke it up concurrently.  The aborted contender
      didn't acquire the lock and therefor never did an unlock followed by
      waking up the next waiter.
      
      Add abort_exclusive_wait() which removes the process' wait descriptor from
      the waitqueue, iff still queued, or wakes up the next waiter otherwise.
      It does so under the waitqueue lock.  Racing with a wake up means the
      aborting process is either already woken (removed from the queue) and will
      wake up the next waiter, or it will remove itself from the queue and the
      concurrent wake up will apply to the next waiter after it.
      
      Use abort_exclusive_wait() in __wait_event_interruptible_exclusive() and
      __wait_on_bit_lock() when they were interrupted by other means than a wake
      up through the queue.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Reported-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Mentored-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Chuck Lever <cel@citi.umich.edu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>		["after some testing"]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      777c6c5f
    • A
      fbmem: don't call copy_from/to_user() with mutex held · 1f5e31d7
      Andrea Righi 提交于
      Avoid calling copy_from/to_user() with fb_info->lock mutex held in fbmem
      ioctl().
      
      fb_mmap() is called under mm->mmap_sem (A) held, that also acquires
      fb_info->lock (B); fb_ioctl() takes fb_info->lock (B) and does
      copy_from/to_user() that might acquire mm->mmap_sem (A), causing a
      deadlock.
      
      NOTE: it doesn't push down the fb_info->lock in each own driver's
      fb_ioctl(), so there are still potential deadlocks elsewhere.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Johannes Weiner <hannes@saeurebad.de>
      Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f5e31d7
    • P
      generic swap(): don't return a value from swap() · ac7b9004
      Peter Zijlstra 提交于
      The swap() macro is accidentally retuning the value of its first argument.
      Change it into a doesn't-return-anything macro before someone goes and
      relies upon this behaviour.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Wu Fengguang <wfg@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac7b9004
  11. 05 2月, 2009 4 次提交
  12. 03 2月, 2009 4 次提交
    • R
      sched: add missing kernel-doc in sched.h · 35626129
      Randy Dunlap 提交于
      Add kernel-doc notation for @lock:
      
      include/linux/sched.h:457: No description found for parameter 'lock'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      35626129
    • T
      libata: implement HORKAGE_1_5_GBPS and apply it to WD My Book · 9062712f
      Tejun Heo 提交于
      3Gbps is often much more prone to transmission failures.  It's usually
      okay to let EH handle speed down after transmission failures but some
      WD My Book drives completely shutdown after certain transmission
      failures and after it only power cycling can revive them.  Combined
      with the fact that external drives often end up with cable assembly
      which is longer than usual and more likely to have intervening gender,
      this makes these drives very likely to shutdown under certain
      configurations virtually rendering them unusable.
      
      This patch implements HOARKGE_1_5_GBPS and applies it to WD My Book
      such that 1.5Gbps is forced once the device is identified.
      
      Please take a look at the following bz for related reports.
      
        http://bugzilla.kernel.org/show_bug.cgi?id=9913Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      9062712f
    • T
      libata: clear dev->ering in smarter way · 99cf610a
      Tejun Heo 提交于
      dev->ering used to be cleared together with the rest of ata_device in
      ata_dev_init() which is called whenever a probing event occurs.
      dev->ering is about to be used to track probing failures so it needs
      to remain persistent over multiple porbing events.  This patch
      achieves this by doing the following.
      
      * Instead of CLEAR_OFFSET, define CLEAR_BEGIN and CLEAR_END and only
        clear between BEGIN and END.  ering is moved after END.  The split
        of persistent area is to allow hotter items remain at the head.
      
      * ering is explicitly cleared on ata_dev_disable() and when device
        attach succeeds.  So, ering is persistent throug a device's life
        time (unless explicitly cleared of course) and also through periods
        inbetween disablement of an attached device and successful detection
        of the next one.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      99cf610a
    • S
      ide/libata: fix ata_id_is_cfa() (take 4) · 2999b58b
      Sergei Shtylyov 提交于
      When checking for the CFA feature set support, ata_id_is_cfa() tests bit 2 in
      word 82 of the identify data instead the word 83;  it also checks the ATA/PI
      version support in the word 80 (which the CompactFlash specifications have as
      reserved), this having no slightest chance to work on the modern CF cards that
      don't have 0x848A in the word 0...
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      2999b58b