1. 19 8月, 2009 1 次提交
    • T
      clocksource: Protect the watchdog rating changes with clocksource_mutex · d0981a1b
      Thomas Gleixner 提交于
      Martin pointed out that commit 6ea41d2529 (clocksource: Call
      clocksource_change_rating() outside of watchdog_lock) has a
      theoretical reference count problem. The calls to
      clocksource_change_rating() are now done outside of the clocksource
      mutex and outside of the watchdog lock. A concurrent
      clocksource_unregister() could remove the clock.
      
      Split out the code which changes the rating from
      clocksource_change_rating() into __clocksource_change_rating().
      
      Protect the clocksource_watchdog_work() code sequence with the
      clocksource_mutex() and call __clocksource_change_rating().
      
      LKML-Reference: <alpine.LFD.2.00.0908171038420.2782@localhost.localdomain>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      d0981a1b
  2. 15 8月, 2009 16 次提交
  3. 14 8月, 2009 1 次提交
    • L
      genirq: prevent wakeup of freed irq thread · 2d860ad7
      Linus Torvalds 提交于
      free_irq() can remove an irqaction while the corresponding interrupt
      is in progress, but free_irq() sets action->thread to NULL
      unconditionally, which might lead to a NULL pointer dereference in
      handle_IRQ_event() when the hard interrupt context tries to wake up
      the handler thread.
      
      Prevent this by moving the thread stop after synchronize_irq(). No
      need to set action->thread to NULL either as action is going to be
      freed anyway.
      
      This fixes a boot crash reported against preempt-rt which uses the
      mainline irq threads code to implement full irq threading.
      
      [ tglx: removed local irqthread variable ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2d860ad7
  4. 13 8月, 2009 6 次提交
    • P
      perf_counter: Report the cloning task as parent on perf_counter_fork() · 94d5d1b2
      Peter Zijlstra 提交于
      A bug in (9f498cc5: perf_counter: Full task tracing) makes
      profiling multi-threaded apps it go belly up.
      
      [ output as: (PID:TID):(PPID:PTID) ]
      
       # ./perf report -D | grep FORK
      0x4b0 [0x18]: PERF_EVENT_FORK: (3237:3237):(3236:3236)
      0xa10 [0x18]: PERF_EVENT_FORK: (3237:3238):(3236:3236)
      0xa70 [0x18]: PERF_EVENT_FORK: (3237:3239):(3236:3236)
      0xad0 [0x18]: PERF_EVENT_FORK: (3237:3240):(3236:3236)
      0xb18 [0x18]: PERF_EVENT_FORK: (3237:3241):(3236:3236)
      
      Shows us that the test (27d028de perf report: Update for the new
      FORK/EXIT events) in builtin-report.c:
      
              /*
               * A thread clone will have the same PID for both
               * parent and child.
               */
              if (thread == parent)
                      return 0;
      
      Will clearly fail.
      
      The problem is that perf_counter_fork() reports the actual
      parent, instead of the cloning thread.
      
      Fixing that (with the below patch), yields:
      
       # ./perf report -D | grep FORK
      0x4c8 [0x18]: PERF_EVENT_FORK: (1590:1590):(1589:1589)
      0xbd8 [0x18]: PERF_EVENT_FORK: (1590:1591):(1590:1590)
      0xc80 [0x18]: PERF_EVENT_FORK: (1590:1592):(1590:1590)
      0x3338 [0x18]: PERF_EVENT_FORK: (1590:1593):(1590:1590)
      0x66b0 [0x18]: PERF_EVENT_FORK: (1590:1594):(1590:1590)
      
      Which both makes more sense and doesn't confuse perf report
      anymore.
      Reported-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: paulus@samba.org
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Arjan van de Ven <arjan@infradead.org>
      LKML-Reference: <1250172882.5241.62.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      94d5d1b2
    • P
      perf_counter: Fix an ipi-deadlock · 970892a9
      Peter Zijlstra 提交于
      perf_pending_counter() is called from IRQ context and will call
      perf_counter_disable(), however perf_counter_disable() uses
      smp_call_function_single() which doesn't fancy being used with
      IRQs disabled due to IPI deadlocks.
      
      Fix this by making it use the local __perf_counter_disable()
      call and teaching the counter_sched_out() code about pending
      disables as well.
      
      This should cover the case where a counter migrates before the
      pending queue gets processed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey J Ashford <cjashfor@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      LKML-Reference: <20090813103655.244097721@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      970892a9
    • P
      perf: Rework/fix the whole read vs group stuff · 3dab77fb
      Peter Zijlstra 提交于
      Replace PERF_SAMPLE_GROUP with PERF_SAMPLE_READ and introduce
      PERF_FORMAT_GROUP to deal with group reads in a more generic
      way.
      
      This allows you to get group reads out of read() as well.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Corey J Ashford <cjashfor@us.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      LKML-Reference: <20090813103655.117411814@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3dab77fb
    • P
      perf_counter: Fix swcounter context invariance · bcfc2602
      Peter Zijlstra 提交于
      perf_swcounter_is_counting() uses a lock, which means we cannot
      use swcounters from NMI or when holding that particular lock,
      this is unintended.
      
      The below removes the lock, this opens up race window, but not
      worse than the swcounters already experience due to RCU
      traversal of the context in perf_swcounter_ctx_event().
      
      This also fixes the hard lockups while opening a lockdep
      tracepoint counter.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: Corey J Ashford <cjashfor@us.ibm.com>
      LKML-Reference: <1250149915.10001.66.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bcfc2602
    • I
      perf_counter: Provide hw_perf_counter_setup_online() APIs · 28402971
      Ingo Molnar 提交于
      Provide weak aliases for hw_perf_counter_setup_online(). This is
      used by the BTS patches (for v2.6.32), but it interacts with
      fixes so propagate this upstream. (it has no effect as of yet)
      
      Also export perf_counter_output() to architecture code.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      28402971
    • A
      Remove double removal of blktrace directory · 39cbb602
      Alan D. Brunelle 提交于
      commit fd51d251
      Author: Stefan Raspl <raspl@linux.vnet.ibm.com>
      Date:   Tue May 19 09:59:08 2009 +0200
      
          blktrace: remove debugfs entries on bad path
      
      added in an explicit invocation of debugfs_remove for bt->dir, in
      blk_remove_buf_file_callback we are also getting the directory removed. On
      occasion I am seeing memory corruption that I have bisected down to
      this commit. [The testing involves a (long) series of I/O benchmarks
      with blktrace invoked around the actual runs.] I believe that this
      committed patch is correct, but the problem actually lies in the code
      in blk_remove_buf_file_callback.
      
      With this patch I am able to consistently get complete runs whereas
      previously I could not get a single run to complete.
      
      The first part of the patch simply moves the debugfs_remove below the
      relay_close: the relay_close call will remove files under bt->dir, and
      so we should not remove the directory until all the files we created
      have been removed. (Note: This is not sufficient to fix the problem -
      the file system code has ref counts on the directoy, so our invocation
      does not cause the directory to actually be removed. Nonetheless, we
      should not rely upon that feature.)
      Signed-off-by: NAlan D. Brunelle <alan.brunelle@hp.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      39cbb602
  5. 11 8月, 2009 1 次提交
    • D
      futex: Fix handling of bad requeue syscall pairing · 392741e0
      Darren Hart 提交于
      If futex_requeue(requeue_pi=1) finds a futex_q that was created by a call
      other the futex_wait_requeue_pi(), the q.rt_waiter may be null.  If so,
      this will result in an oops from the following call graph:
      
      futex_requeue()
        rt_mutex_start_proxy_lock()
          task_blocks_on_rt_mutex()
            waiter->task dereference
              OOPS
      
      We currently WARN_ON() if this is detected, clearly this is inadequate.
      If we detect a mispairing in futex_requeue(), bail out, seding -EINVAL to
      user-space.
      
      V2: Fix parenthesis warnings.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: John Kacur <jkacur@redhat.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7CA8C0.7010809@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      392741e0
  6. 10 8月, 2009 5 次提交
    • D
      futex: Fix compat_futex to be same as futex for REQUEUE_PI · 4dc88029
      Dinakar Guniguntala 提交于
      Need to add the REQUEUE_PI checks to the compat_sys_futex API
      as well to ensure 32 bit requeue's work fine on a 64 bit
      system. Patch is against latest tip
      Signed-off-by: NDinakar Guniguntala <dino@in.ibm.com>
      Cc: Darren Hart <dvhltc@us.ibm.com>
      LKML-Reference: <20090810130142.GA23619@in.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4dc88029
    • P
      locking, sched: Give waitqueue spinlocks their own lockdep classes · 2fc39111
      Peter Zijlstra 提交于
      Give waitqueue spinlocks their own lockdep classes when they
      are initialised from init_waitqueue_head().  This means that
      struct wait_queue::func functions can operate other waitqueues.
      
      This is used by CacheFiles to catch the page from a backing fs
      being unlocked and to wake up another thread to take a copy of
      it.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NTakashi Iwai <tiwai@suse.de>
      Cc: linux-cachefs@redhat.com
      Cc: torvalds@osdl.org
      Cc: akpm@linux-foundation.org
      LKML-Reference: <20090810113305.17284.81508.stgit@warthog.procyon.org.uk>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2fc39111
    • P
      perf_counter: Require CAP_SYS_ADMIN for raw tracepoint data · a4e95fc2
      Peter Zijlstra 提交于
      Raw tracepoint data contains various kernel internals and
      data from other users, so restrict this to CAP_SYS_ADMIN.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249896452.17467.75.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a4e95fc2
    • P
      perf_counter: Correct PERF_SAMPLE_RAW output · a044560c
      Peter Zijlstra 提交于
      PERF_SAMPLE_* output switches should unconditionally output the
      correct format, as they are the only way to unambiguously parse
      the PERF_EVENT_SAMPLE data.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249896447.17467.74.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a044560c
    • D
      futex: Update futex_q lock_ptr on requeue proxy lock · beda2c7e
      Darren Hart 提交于
      futex_requeue() can acquire the lock on behalf of a waiter
      early on or during the requeue loop if it is uncontended or in
      the event of a lock steal or owner died. On wakeup, the waiter
      (in futex_wait_requeue_pi()) cleans up the pi_state owner using
      the lock_ptr to protect against concurrent access to the
      pi_state. The pi_state is hung off futex_q's on the requeue
      target futex hash bucket so the lock_ptr needs to be updated
      accordingly.
      
      The problem manifested by triggering the WARN_ON in
      lookup_pi_state() about the pid != pi_state->owner->pid.  With
      this patch, the pi_state is properly guarded against concurrent
      access via the requeue target hb lock.
      
      The astute reviewer may notice that there is a window of time
      between when futex_requeue() unlocks the hb locks and when
      futex_wait_requeue_pi() will acquire hb2->lock.  During this
      time the pi_state and uval are not in sync with the underlying
      rtmutex owner (but the uval does indicate there are waiters, so
      no atomic changes will occur in userspace).  However, this is
      not a problem. Should a contending thread enter
      lookup_pi_state() and acquire hb2->lock before the ownership is
      fixed up, it will find the pi_state hung off a waiter's
      (possibly the pending owner's) futex_q and block on the
      rtmutex.  Once futex_wait_requeue_pi() fixes up the owner, it
      will also move the pi_state from the old owner's
      task->pi_state_list to its own.
      
      v3: Fix plist lock name for application to mainline (rather
          than -rt) Compile tested against tip/v2.6.31-rc5.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Dinakar Guniguntala <dino@in.ibm.com>
      Cc: John Stultz <johnstul@linux.vnet.ibm.com>
      LKML-Reference: <4A7F4EFF.6090903@us.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      beda2c7e
  7. 09 8月, 2009 7 次提交
    • P
      perf_counter: Fix a race on perf_counter_ctx · 3a80b4a3
      Peter Zijlstra 提交于
      While extending perfcounters with BTS hw-tracing, Markus
      Metzger managed to trigger this warning:
      
         [  995.557128] WARNING: at kernel/perf_counter.c:1191 __perf_counter_task_sched_out+0x48/0x6b()
      
      triggers because commit
      9f498cc5 (perf_counter: Full
      task tracing) removed clearing of tsk->perf_counter_ctxp out
      from under ctx->lock which introduced a race (against
      perf_lock_task_context).
      
      Move it back and deal with the exit notification by explicitly
      passing along the former task context.
      Reported-by: NMarkus T Metzger <markus.t.metzger@intel.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249667341.17467.5.camel@twins>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a80b4a3
    • F
      perf_counter: Fix tracepoint sampling to be part of generic sampling · 3a43ce68
      Frederic Weisbecker 提交于
      Based on Peter's comments, make tracepoint sampling generic
      just like all the other sampling bits are. This is a rename
      with no code changes:
      
      - PERF_SAMPLE_TP_RECORD to PERF_SAMPLE_RAW
      - struct perf_tracepoint_record to perf_raw_record
      
      We want the system in place that transport tracepoints raw
      samples events into the perf ring buffer to be generalized and
      usable by any type of counter.
      
      Reported-by; Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249698400-5441-4-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a43ce68
    • F
      perf_counter: Work around gcc warning by initializing tracepoint record unconditionally · 10b8e306
      Frederic Weisbecker 提交于
      Despite that the tracepoint record is always present when the
      PERF_SAMPLE_TP_RECORD flag is set, gcc raises a warning,
      thinking it might not be initialized:
      
        kernel/perf_counter.c: In function ‘perf_counter_output’:
        kernel/perf_counter.c:2650: warning: ‘tp’ may be used uninitialized in this function
      
      Then, initialize it to NULL and always check if it's not NULL
      before dereference it.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1249698400-5441-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      10b8e306
    • P
      perf_counter: Fix software counters for fast moving event sources · 7b4b6658
      Peter Zijlstra 提交于
      Reimplement the software counters to deal with fast moving
      event sources (such as tracepoints). This means being able
      to generate multiple overflows from a single 'event' as well
      as support throttling.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7b4b6658
    • F
      perf_counter: Fix/complete ftrace event records sampling · f413cdb8
      Frederic Weisbecker 提交于
      This patch implements the kernel side support for ftrace event
      record sampling.
      
      A new counter sampling attribute is added:
      
         PERF_SAMPLE_TP_RECORD
      
      which requests ftrace events record sampling. In this case
      if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
      fires, we emit the tracepoint binary record to the
      perfcounter event buffer, as a sample.
      
      Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
      record:
      
       perf record -f -F 1 -a -e workqueue:workqueue_execution
       perf report -D
      
       0x21e18 [0x48]: event: 9
       .
       . ... raw event: size 72 bytes
       .  0000:  09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff  ......H........
       .  0010:  0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00  ........!......
       .  0020:  2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e  +...........eve
       .  0030:  74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00  ts/1...........
       .  0040:  e0 b1 31 81 ff ff ff ff                          .......
      .
      0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33
      
      The raw ftrace binary record starts at offset 0020.
      
      Translation:
      
       struct trace_entry {
      	type		= 0x2b = 43;
      	flags		= 1;
      	preempt_count	= 2;
      	pid		= 0xa = 10;
      	tgid		= 0xa = 10;
       }
      
       thread_comm = "events/1"
       thread_pid  = 0xa = 10;
       func	    = 0xffffffff8131b1e0 = flush_to_ldisc()
      
      What will come next?
      
       - Userspace support ('perf trace'), 'flight data recorder' mode
         for perf trace, etc.
      
       - The unconditional copy from the profiling callback brings
         some costs however if someone wants no such sampling to
         occur, and needs to be fixed in the future. For that we need
         to have an instant access to the perf counter attribute.
         This is a matter of a flag to add in the struct ftrace_event.
      
       - Take care of the events recursivity! Don't ever try to record
         a lock event for example, it seems some locking is used in
         the profiling fast path and lead to a tracing recursivity.
         That will be fixed using raw spinlock or recursivity
         protection.
      
       - [...]
      
       - Profit! :-)
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tom Zanussi <tzanussi@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f413cdb8
    • P
      perf_counter, ftrace: Fix perf_counter integration · 3a659305
      Peter Zijlstra 提交于
      Adds possible second part to the assign argument of TP_EVENT().
      
        TP_perf_assign(
      	__perf_count(foo);
      	__perf_addr(bar);
        )
      
      Which, when specified make the swcounter increment with @foo instead
      of the usual 1, and report @bar for PERF_SAMPLE_ADDR (data address
      associated with the event) when this triggers a counter overflow.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3a659305
    • S
      posix_cpu_timers_exit_group(): Do not use thread_group_cputimer() · 17d42c1c
      Stanislaw Gruszka 提交于
      When the process exits we don't have to run new cputimer nor
      use running one (as it not accounts when tsk->exit_state != 0)
      to get process CPU times.  As there is only one thread we can
      just use CPU times fields from task and signal structs.
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Vitaly Mayatskikh <vmayatsk@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17d42c1c
  8. 08 8月, 2009 3 次提交