1. 10 9月, 2010 1 次提交
    • H
      generic-ipi: Fix deadlock in __smp_call_function_single · 27c379f7
      Heiko Carstens 提交于
      Just got my 6 way machine to a state where cpu 0 is in an
      endless loop within __smp_call_function_single.
      All other cpus are idle.
      
      The call trace on cpu 0 looks like this:
      
       __smp_call_function_single
       scheduler_tick
       update_process_times
       tick_sched_timer
       __run_hrtimer
       hrtimer_interrupt
       clock_comparator_work
       do_extint
       ext_int_handler
       ----> timer irq
       cpu_idle
      
      __smp_call_function_single() got called from nohz_balancer_kick()
      (inlined) with the remote cpu being 1, wait being 0 and the per
      cpu variable remote_sched_softirq_cb (call_single_data) of the
      current cpu (0).
      
      Then it loops forever when it tries to grab the lock of the
      call_single_data, since it is already locked and enqueued on cpu 0.
      
      My theory how this could have happened: for some reason the
      scheduler decided to call __smp_call_function_single() on it's own
      cpu, and sends an IPI to itself. The interrupt stays pending
      since IRQs are disabled. If then the hypervisor schedules the
      cpu away it might happen that upon rescheduling both the IPI and
      the timer IRQ are pending. If then interrupts are enabled again
      it depends which one gets scheduled first.
      If the timer interrupt gets delivered first we end up with the
      local deadlock as seen in the calltrace above.
      
      Let's make __smp_call_function_single() check if the target cpu is
      the current cpu and execute the function immediately just like
      smp_call_function_single does. That should prevent at least the
      scenario described here.
      
      It might also be that the scheduler is not supposed to call
      __smp_call_function_single with the remote cpu being the current
      cpu, but that is a different issue.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJens Axboe <jaxboe@fusionio.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20100910114729.GB2827@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      27c379f7
  2. 28 5月, 2010 1 次提交
  3. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  4. 18 1月, 2010 1 次提交
  5. 17 1月, 2010 1 次提交
    • D
      smp_call_function_any(): pass the node value to cpumask_of_node() · af2422c4
      David John 提交于
      The change in acpi_cpufreq to use smp_call_function_any causes a warning
      when it is called since the function erroneously passes the cpu id to
      cpumask_of_node rather than the node that the cpu is on.  Fix this.
      
      cpumask_of_node(3): node > nr_node_ids(1)
      Pid: 1, comm: swapper Not tainted 2.6.33-rc3-00097-g2c1f1895 #223
      Call Trace:
       [<ffffffff81028bb3>] cpumask_of_node+0x23/0x58
       [<ffffffff81061f51>] smp_call_function_any+0x65/0xfa
       [<ffffffff810160d1>] ? do_drv_read+0x0/0x2f
       [<ffffffff81015fba>] get_cur_val+0xb0/0x102
       [<ffffffff81016080>] get_cur_freq_on_cpu+0x74/0xc5
       [<ffffffff810168a7>] acpi_cpufreq_cpu_init+0x417/0x515
       [<ffffffff81562ce9>] ? __down_write+0xb/0xd
       [<ffffffff8148055e>] cpufreq_add_dev+0x278/0x922
      Signed-off-by: NDavid John <davidjon@xenontk.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af2422c4
  6. 16 12月, 2009 1 次提交
  7. 15 12月, 2009 1 次提交
  8. 18 11月, 2009 1 次提交
    • R
      generic-ipi: Add smp_call_function_any() · 2ea6dec4
      Rusty Russell 提交于
      Andrew points out that acpi-cpufreq uses cpumask_any, when it really
      would prefer to use the same CPU if possible (to avoid an IPI).  In
      general, this seems a good idea to offer.
      
      [ tglx: Documented selection preference and Inlined the UP case to
        	avoid the copy of smp_call_function_single() and the extra
        	EXPORT ]
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Zhao Yakui <yakui.zhao@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: "Zhang, Yanmin" <yanmin_zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      2ea6dec4
  9. 23 10月, 2009 1 次提交
  10. 24 9月, 2009 1 次提交
  11. 23 9月, 2009 1 次提交
    • X
      generic-ipi: make struct call_function_data lockless · 54fdade1
      Xiao Guangrong 提交于
      This patch can remove spinlock from struct call_function_data, the
      reasons are below:
      
      1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
         it can atomically test and clear specific cpu, we can use it instead
         of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
         to protect those in generic_smp_call_function_interrupt().
      
      2: in smp_call_function_many(), after csd_lock() return, the current's
         cfd_data is deleted from call_function list, so it not have race
         between other cpus, then cfs_data is only used in
         smp_call_function_many() that must disable preemption and not from
         a hardware interrupthandler or from a bottom half handler to call,
         only the correspond cpu can use it, so it not have race in current
         cpu, no need cfs_data->lock to protect it.
      
      3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
         generic_smp_call_function_interrupt(), so we can define cfs_data->refs
         to atomic_t, and no need cfs_data->lock any more.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      [akpm@linux-foundation.org: use atomic_dec_return()]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54fdade1
  12. 22 8月, 2009 1 次提交
  13. 08 8月, 2009 1 次提交
  14. 09 6月, 2009 1 次提交
  15. 13 3月, 2009 1 次提交
  16. 25 2月, 2009 4 次提交
    • I
      generic-ipi: cleanups · 0b13fda1
      Ingo Molnar 提交于
      Andrew pointed out that there's some small amount of
      style rot in kernel/smp.c.
      
      Clean it up.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b13fda1
    • P
      generic-ipi: remove CSD_FLAG_WAIT · 6e275637
      Peter Zijlstra 提交于
      Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
      the code so that we can use CSD_FLAG_LOCK for both purposes.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e275637
    • P
      generic-ipi: remove kmalloc() · 8969a5ed
      Peter Zijlstra 提交于
      Remove the use of kmalloc() from the smp_call_function_*()
      calls.
      
      Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
      data for single cpu ipi calls) started the discussion on the use
      of kmalloc() in this code and fixed the
      smp_call_function_single(.wait=0) fallback case.
      
      In this patch we complete this by also providing means for the
      _many() call, which fully removes the need for kmalloc() in this
      code.
      
      The problem with the _many() call is that other cpus might still
      be observing our entry when we're done with it. It solved this
      by dynamically allocating data elements and RCU-freeing it.
      
      We solve it by using a single per-cpu entry which provides
      static storage and solves one half of the problem (avoiding
      referencing freed data).
      
      The other half, ensuring the queue iteration it still possible,
      is done by placing re-used entries at the head of the list. This
      means that if someone was still iterating that entry when it got
      moved, he will now re-visit the entries on the list he had
      already seen, but avoids skipping over entries like would have
      happened had we placed the new entry at the end.
      
      Furthermore, visiting entries twice is not a problem, since we
      remove our cpu from the entry's cpumask once its called.
      
      Many thanks to Oleg for his suggestions and him poking holes in
      my earlier attempts.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8969a5ed
    • N
      generic IPI: simplify barriers and locking · 15d0d3b3
      Nick Piggin 提交于
      Simplify the barriers in generic remote function call interrupt
      code.
      
      Firstly, just unconditionally take the lock and check the list
      in the generic_call_function_single_interrupt IPI handler. As
      we've just taken an IPI here, the chances are fairly high that
      there will be work on the list for us, so do the locking
      unconditionally. This removes the tricky lockless list_empty
      check and dubious barriers. The change looks bigger than it is
      because it is just removing an outer loop.
      
      Secondly, clarify architecture specific IPI locking rules.
      Generic code has no tools to impose any sane ordering on IPIs if
      they go outside normal cache coherency, ergo the arch code must
      make them appear to obey cache coherency as a "memory operation"
      to initiate an IPI, and a "memory operation" to receive one.
      This way at least they can be reasoned about in generic code,
      and smp_mb used to provide ordering.
      
      The combination of these two changes means that explict barriers
      can be taken out of queue handling for the single case -- shared
      data is explicitly locked, and ipi ordering must conform to
      that, so no barriers needed. An extra barrier is needed in the
      many handler, so as to ensure we load the list element after the
      IPI is received.
      
      Does any architecture actually *need* these barriers? For the
      initiator I could see it, but for the handler I would be
      surprised. So the other thing we could do for simplicity is just
      to require that, rather than just matching with cache coherency,
      we just require a full barrier before generating an IPI, and
      after receiving an IPI. In which case, the smp_mb()s can go
      away. But just for now, we'll be on the safe side and use the
      barriers (they're in the slow case anyway).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-arch@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15d0d3b3
  17. 31 1月, 2009 1 次提交
    • S
      generic-ipi: use per cpu data for single cpu ipi calls · d7240b98
      Steven Rostedt 提交于
      The smp_call_function can be passed a wait parameter telling it to
      wait for all the functions running on other CPUs to complete before
      returning, or to return without waiting. Unfortunately, this is
      currently just a suggestion and not manditory. That is, the
      smp_call_function can decide not to return and wait instead.
      
      The reason for this is because it uses kmalloc to allocate storage
      to send to the called CPU and that CPU will free it when it is done.
      But if we fail to allocate the storage, the stack is used instead.
      This means we must wait for the called CPU to finish before
      continuing.
      
      Unfortunatly, some callers do no abide by this hint and act as if
      the non-wait option is mandatory. The MTRR code for instance will
      deadlock if the smp_call_function is set to wait. This is because
      the smp_call_function will wait for the other CPUs to finish their
      called functions, but those functions are waiting on the caller to
      continue.
      
      This patch changes the generic smp_call_function code to use per cpu
      variables if the allocation of the data fails for a single CPU call. The
      smp_call_function_many will fall back to the smp_call_function_single
      if it fails its alloc. The smp_call_function_single is modified
      to not force the wait state.
      
      Since we now are using a single data per cpu we must synchronize the
      callers to prevent a second caller modifying the data before the
      first called IPI functions complete. To do so, I added a flag to
      the call_single_data called CSD_FLAG_LOCK. When the single CPU is
      called (which can be called when a many call fails an alloc), we
      set the LOCK bit on this per cpu data. When the caller finishes
      it clears the LOCK bit.
      
      The caller must wait till the LOCK bit is cleared before setting
      it. When it is cleared, there is no IPI function using it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7240b98
  18. 01 1月, 2009 1 次提交
  19. 30 12月, 2008 2 次提交
    • R
      cpumask: arch_send_call_function_ipi_mask: core · ce47d974
      Rusty Russell 提交于
      Impact: new API to reduce stack usage
      
      We're weaning the core code off handing cpumask's around on-stack.
      This introduces arch_send_call_function_ipi_mask().
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ce47d974
    • R
      cpumask: smp_call_function_many() · 54b11e6d
      Rusty Russell 提交于
      Impact: Implementation change to remove cpumask_t from stack.
      
      Actually change smp_call_function_mask() to smp_call_function_many().
      We avoid cpumasks on the stack in this version.
      
      (S390 has its own version, but that's going away apparently).
      
      We have to do some dancing to figure out if 0 or 1 other cpus are in
      the mask supplied and the online mask without allocating a tmp
      cpumask.  It's still fairly cheap.
      
      We allocate the cpumask at the end of the call_function_data
      structure: if allocation fails we fallback to smp_call_function_single
      rather than using the baroque quiescing code (which needs a cpumask on
      stack).
      
      (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: npiggin@suse.de
      Cc: axboe@kernel.dk
      54b11e6d
  20. 06 11月, 2008 1 次提交
    • S
      generic-ipi: fix the smp_mb() placement · 561920a0
      Suresh Siddha 提交于
      smp_mb() is needed (to make the memory operations visible globally) before
      sending the ipi on the sender and the receiver (on Alpha atleast) needs
      smp_read_barrier_depends() in the handler before reading the call_single_queue
      list in a lock-free fashion.
      
      On x86, x2apic mode register accesses for sending IPI's don't have serializing
      semantics. So the need for smp_mb() before sending the IPI becomes more
      critical in x2apic mode.
      
      Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
      smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
      doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      561920a0
  21. 26 8月, 2008 1 次提交
  22. 12 8月, 2008 1 次提交
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix · c2fc1198
      Nick Piggin 提交于
      > > Nick Piggin (1):
      > >       generic-ipi: fix stack and rcu interaction bug in
      > > smp_call_function_mask()
      >
      > I'm still not 100% sure that I have this patch right... I might have seen
      > a lockup trace implicating the smp call function path... which may have
      > been due to some other problem or a different bug in the new call function
      > code, but if some more people can take a look at it before merging?
      
      OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
      data properly in the alloc && wait case. Secondly, I wasn't resetting
      CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
      wait).
      
      After those fixes, the patch boots and runs with the kmalloc commented
      out (so it always executes the slowpath).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c2fc1198
  23. 11 8月, 2008 1 次提交
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c
      Nick Piggin 提交于
      * Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
      
      > Found a OOPS on a big SMP box during an overnight reboot test with
      > upstream git.
      >
      > Suresh and I looked at the oops and looks like the root cause is in
      > generic_smp_call_function_interrupt() and smp_call_function_mask() with
      > wait parameter.
      >
      > The actual oops looked like
      >
      > [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
      > [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.278155] PGD 202063 PUD 0
      > [   11.278576] Oops: 0010 [1] SMP
      > [   11.279006] CPU 5
      > [   11.279336] Modules linked in:
      > [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7 #290
      > [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
      > [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
      > [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
      > [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
      > [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
      > [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
      > [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
      > [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
      > [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
      > [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
      > [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
      > [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
      > [   11.287120] Call Trace:
      > [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
      > [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
      > [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
      > [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
      > [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
      > [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
      > [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
      > [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
      > [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
      > [   11.291010]
      > [   11.291287]
      > [   11.291654] Code:  Bad RIP value.
      > [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.292380]  RSP <ffff88027f1f7f70>
      > [   11.292741] CR2: ffff8802ffffffff
      > [   11.310951] ---[ end trace 137c54d525305f1c ]---
      >
      > The problem is with the following sequence of events:
      >
      > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
      > - CPU A sets up the call_function_data on the stack and does an rcu add to
      >   call_function_queue
      > - CPU A waits until the WAIT flag is cleared
      > - CPU B gets the call function interrupt and starts going through the
      >   call_function_queue
      > - CPU C also gets some other call function interrupt and starts going through
      >   the call_function_queue
      > - CPU C, which is also going through the call_function_queue, starts referencing
      >   CPU A's stack, as that element is still in call_function_queue
      > - CPU B finishes the function call that CPU A set up and as there are no other
      >   references to it, rcu deletes the call_function_data (which was from CPU A
      >   stack)
      > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
      > - CPU A which was waiting on the flag continues executing and the stack
      >   contents change
      >
      > - CPU C is still in rcu_read section accessing the CPU A's stack sees
      >   inconsistent call_funation_data and can try to execute
      >   function with some random pointer, causing stack corruption for A
      >   (by clearing the bits in mask field) and oops.
      
      Nice debugging work.
      
      I'd suggest something like the attached (boot tested) patch as the simple
      fix for now.
      
      I expect the benefits from the less synchronized, multiple-in-flight-data
      global queue will still outweigh the costs of dynamic allocations. But
      if worst comes to worst then we just go back to a globally synchronous
      one-at-a-time implementation, but that would be pretty sad!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc7a486c
  24. 27 7月, 2008 1 次提交
  25. 16 7月, 2008 1 次提交
  26. 27 6月, 2008 1 次提交
  27. 26 6月, 2008 2 次提交