1. 23 9月, 2009 1 次提交
    • X
      generic-ipi: make struct call_function_data lockless · 54fdade1
      Xiao Guangrong 提交于
      This patch can remove spinlock from struct call_function_data, the
      reasons are below:
      
      1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
         it can atomically test and clear specific cpu, we can use it instead
         of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
         to protect those in generic_smp_call_function_interrupt().
      
      2: in smp_call_function_many(), after csd_lock() return, the current's
         cfd_data is deleted from call_function list, so it not have race
         between other cpus, then cfs_data is only used in
         smp_call_function_many() that must disable preemption and not from
         a hardware interrupthandler or from a bottom half handler to call,
         only the correspond cpu can use it, so it not have race in current
         cpu, no need cfs_data->lock to protect it.
      
      3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
         generic_smp_call_function_interrupt(), so we can define cfs_data->refs
         to atomic_t, and no need cfs_data->lock any more.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      [akpm@linux-foundation.org: use atomic_dec_return()]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54fdade1
  2. 22 8月, 2009 1 次提交
  3. 08 8月, 2009 1 次提交
  4. 09 6月, 2009 1 次提交
  5. 13 3月, 2009 1 次提交
  6. 25 2月, 2009 4 次提交
    • I
      generic-ipi: cleanups · 0b13fda1
      Ingo Molnar 提交于
      Andrew pointed out that there's some small amount of
      style rot in kernel/smp.c.
      
      Clean it up.
      Reported-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      0b13fda1
    • P
      generic-ipi: remove CSD_FLAG_WAIT · 6e275637
      Peter Zijlstra 提交于
      Oleg noticed that we don't strictly need CSD_FLAG_WAIT, rework
      the code so that we can use CSD_FLAG_LOCK for both purposes.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6e275637
    • P
      generic-ipi: remove kmalloc() · 8969a5ed
      Peter Zijlstra 提交于
      Remove the use of kmalloc() from the smp_call_function_*()
      calls.
      
      Steven's generic-ipi patch (d7240b98: generic-ipi: use per cpu
      data for single cpu ipi calls) started the discussion on the use
      of kmalloc() in this code and fixed the
      smp_call_function_single(.wait=0) fallback case.
      
      In this patch we complete this by also providing means for the
      _many() call, which fully removes the need for kmalloc() in this
      code.
      
      The problem with the _many() call is that other cpus might still
      be observing our entry when we're done with it. It solved this
      by dynamically allocating data elements and RCU-freeing it.
      
      We solve it by using a single per-cpu entry which provides
      static storage and solves one half of the problem (avoiding
      referencing freed data).
      
      The other half, ensuring the queue iteration it still possible,
      is done by placing re-used entries at the head of the list. This
      means that if someone was still iterating that entry when it got
      moved, he will now re-visit the entries on the list he had
      already seen, but avoids skipping over entries like would have
      happened had we placed the new entry at the end.
      
      Furthermore, visiting entries twice is not a problem, since we
      remove our cpu from the entry's cpumask once its called.
      
      Many thanks to Oleg for his suggestions and him poking holes in
      my earlier attempts.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8969a5ed
    • N
      generic IPI: simplify barriers and locking · 15d0d3b3
      Nick Piggin 提交于
      Simplify the barriers in generic remote function call interrupt
      code.
      
      Firstly, just unconditionally take the lock and check the list
      in the generic_call_function_single_interrupt IPI handler. As
      we've just taken an IPI here, the chances are fairly high that
      there will be work on the list for us, so do the locking
      unconditionally. This removes the tricky lockless list_empty
      check and dubious barriers. The change looks bigger than it is
      because it is just removing an outer loop.
      
      Secondly, clarify architecture specific IPI locking rules.
      Generic code has no tools to impose any sane ordering on IPIs if
      they go outside normal cache coherency, ergo the arch code must
      make them appear to obey cache coherency as a "memory operation"
      to initiate an IPI, and a "memory operation" to receive one.
      This way at least they can be reasoned about in generic code,
      and smp_mb used to provide ordering.
      
      The combination of these two changes means that explict barriers
      can be taken out of queue handling for the single case -- shared
      data is explicitly locked, and ipi ordering must conform to
      that, so no barriers needed. An extra barrier is needed in the
      many handler, so as to ensure we load the list element after the
      IPI is received.
      
      Does any architecture actually *need* these barriers? For the
      initiator I could see it, but for the handler I would be
      surprised. So the other thing we could do for simplicity is just
      to require that, rather than just matching with cache coherency,
      we just require a full barrier before generating an IPI, and
      after receiving an IPI. In which case, the smp_mb()s can go
      away. But just for now, we'll be on the safe side and use the
      barriers (they're in the slow case anyway).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: linux-arch@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      15d0d3b3
  7. 31 1月, 2009 1 次提交
    • S
      generic-ipi: use per cpu data for single cpu ipi calls · d7240b98
      Steven Rostedt 提交于
      The smp_call_function can be passed a wait parameter telling it to
      wait for all the functions running on other CPUs to complete before
      returning, or to return without waiting. Unfortunately, this is
      currently just a suggestion and not manditory. That is, the
      smp_call_function can decide not to return and wait instead.
      
      The reason for this is because it uses kmalloc to allocate storage
      to send to the called CPU and that CPU will free it when it is done.
      But if we fail to allocate the storage, the stack is used instead.
      This means we must wait for the called CPU to finish before
      continuing.
      
      Unfortunatly, some callers do no abide by this hint and act as if
      the non-wait option is mandatory. The MTRR code for instance will
      deadlock if the smp_call_function is set to wait. This is because
      the smp_call_function will wait for the other CPUs to finish their
      called functions, but those functions are waiting on the caller to
      continue.
      
      This patch changes the generic smp_call_function code to use per cpu
      variables if the allocation of the data fails for a single CPU call. The
      smp_call_function_many will fall back to the smp_call_function_single
      if it fails its alloc. The smp_call_function_single is modified
      to not force the wait state.
      
      Since we now are using a single data per cpu we must synchronize the
      callers to prevent a second caller modifying the data before the
      first called IPI functions complete. To do so, I added a flag to
      the call_single_data called CSD_FLAG_LOCK. When the single CPU is
      called (which can be called when a many call fails an alloc), we
      set the LOCK bit on this per cpu data. When the caller finishes
      it clears the LOCK bit.
      
      The caller must wait till the LOCK bit is cleared before setting
      it. When it is cleared, there is no IPI function using it.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d7240b98
  8. 01 1月, 2009 1 次提交
  9. 30 12月, 2008 2 次提交
    • R
      cpumask: arch_send_call_function_ipi_mask: core · ce47d974
      Rusty Russell 提交于
      Impact: new API to reduce stack usage
      
      We're weaning the core code off handing cpumask's around on-stack.
      This introduces arch_send_call_function_ipi_mask().
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ce47d974
    • R
      cpumask: smp_call_function_many() · 54b11e6d
      Rusty Russell 提交于
      Impact: Implementation change to remove cpumask_t from stack.
      
      Actually change smp_call_function_mask() to smp_call_function_many().
      We avoid cpumasks on the stack in this version.
      
      (S390 has its own version, but that's going away apparently).
      
      We have to do some dancing to figure out if 0 or 1 other cpus are in
      the mask supplied and the online mask without allocating a tmp
      cpumask.  It's still fairly cheap.
      
      We allocate the cpumask at the end of the call_function_data
      structure: if allocation fails we fallback to smp_call_function_single
      rather than using the baroque quiescing code (which needs a cpumask on
      stack).
      
      (Thanks to Hiroshi Shimamoto for spotting several bugs in previous versions!)
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
      Cc: npiggin@suse.de
      Cc: axboe@kernel.dk
      54b11e6d
  10. 06 11月, 2008 1 次提交
    • S
      generic-ipi: fix the smp_mb() placement · 561920a0
      Suresh Siddha 提交于
      smp_mb() is needed (to make the memory operations visible globally) before
      sending the ipi on the sender and the receiver (on Alpha atleast) needs
      smp_read_barrier_depends() in the handler before reading the call_single_queue
      list in a lock-free fashion.
      
      On x86, x2apic mode register accesses for sending IPI's don't have serializing
      semantics. So the need for smp_mb() before sending the IPI becomes more
      critical in x2apic mode.
      
      Remove the unnecessary smp_mb() in csd_flag_wait(), as the presence of that
      smp_mb() doesn't mean anything on the sender, when the ipi receiver is not
      doing any thing special (like memory fence) after clearing the CSD_FLAG_WAIT.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      561920a0
  11. 26 8月, 2008 1 次提交
  12. 12 8月, 2008 1 次提交
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask(), fix · c2fc1198
      Nick Piggin 提交于
      > > Nick Piggin (1):
      > >       generic-ipi: fix stack and rcu interaction bug in
      > > smp_call_function_mask()
      >
      > I'm still not 100% sure that I have this patch right... I might have seen
      > a lockup trace implicating the smp call function path... which may have
      > been due to some other problem or a different bug in the new call function
      > code, but if some more people can take a look at it before merging?
      
      OK indeed it did have a couple of bugs. Firstly, I wasn't freeing the
      data properly in the alloc && wait case. Secondly, I wasn't resetting
      CSD_FLAG_WAIT in the for each cpu loop (so only the first CPU would
      wait).
      
      After those fixes, the patch boots and runs with the kmalloc commented
      out (so it always executes the slowpath).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c2fc1198
  13. 11 8月, 2008 1 次提交
    • N
      generic-ipi: fix stack and rcu interaction bug in smp_call_function_mask() · cc7a486c
      Nick Piggin 提交于
      * Venki Pallipadi <venkatesh.pallipadi@intel.com> wrote:
      
      > Found a OOPS on a big SMP box during an overnight reboot test with
      > upstream git.
      >
      > Suresh and I looked at the oops and looks like the root cause is in
      > generic_smp_call_function_interrupt() and smp_call_function_mask() with
      > wait parameter.
      >
      > The actual oops looked like
      >
      > [   11.277260] BUG: unable to handle kernel paging request at ffff8802ffffffff
      > [   11.277815] IP: [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.278155] PGD 202063 PUD 0
      > [   11.278576] Oops: 0010 [1] SMP
      > [   11.279006] CPU 5
      > [   11.279336] Modules linked in:
      > [   11.279752] Pid: 0, comm: swapper Not tainted 2.6.27-rc2-00020-g685d87f7 #290
      > [   11.280039] RIP: 0010:[<ffff8802ffffffff>]  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.280692] RSP: 0018:ffff88027f1f7f70  EFLAGS: 00010086
      > [   11.280976] RAX: 00000000ffffffff RBX: 0000000000000000 RCX: 0000000000000000
      > [   11.281264] RDX: 0000000000004f4e RSI: 0000000000000001 RDI: 0000000000000000
      > [   11.281624] RBP: ffff88027f1f7f98 R08: 0000000000000001 R09: ffffffff802509af
      > [   11.281925] R10: ffff8800280c2780 R11: 0000000000000000 R12: ffff88027f097d48
      > [   11.282214] R13: ffff88027f097d70 R14: 0000000000000005 R15: ffff88027e571000
      > [   11.282502] FS:  0000000000000000(0000) GS:ffff88027f1c3340(0000) knlGS:0000000000000000
      > [   11.283096] CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      > [   11.283382] CR2: ffff8802ffffffff CR3: 0000000000201000 CR4: 00000000000006e0
      > [   11.283760] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      > [   11.284048] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      > [   11.284337] Process swapper (pid: 0, threadinfo ffff88027f1f2000, task ffff88027f1f0640)
      > [   11.284936] Stack:  ffffffff80250963 0000000000000212 0000000000ee8c78 0000000000ee8a66
      > [   11.285802]  ffff88027e571550 ffff88027f1f7fa8 ffffffff8021adb5 ffff88027f1f3e40
      > [   11.286599]  ffffffff8020bdd6 ffff88027f1f3e40 <EOI>  ffff88027f1f3ef8 0000000000000000
      > [   11.287120] Call Trace:
      > [   11.287768]  <IRQ>  [<ffffffff80250963>] ? generic_smp_call_function_interrupt+0x61/0x12c
      > [   11.288354]  [<ffffffff8021adb5>] smp_call_function_interrupt+0x17/0x27
      > [   11.288744]  [<ffffffff8020bdd6>] call_function_interrupt+0x66/0x70
      > [   11.289030]  <EOI>  [<ffffffff8024ab3b>] ? clockevents_notify+0x19/0x73
      > [   11.289380]  [<ffffffff803b9b75>] ? acpi_idle_enter_simple+0x18b/0x1fa
      > [   11.289760]  [<ffffffff803b9b6b>] ? acpi_idle_enter_simple+0x181/0x1fa
      > [   11.290051]  [<ffffffff8053aeca>] ? cpuidle_idle_call+0x70/0xa2
      > [   11.290338]  [<ffffffff80209f61>] ? cpu_idle+0x5f/0x7d
      > [   11.290723]  [<ffffffff8060224a>] ? start_secondary+0x14d/0x152
      > [   11.291010]
      > [   11.291287]
      > [   11.291654] Code:  Bad RIP value.
      > [   11.292041] RIP  [<ffff8802ffffffff>] 0xffff8802ffffffff
      > [   11.292380]  RSP <ffff88027f1f7f70>
      > [   11.292741] CR2: ffff8802ffffffff
      > [   11.310951] ---[ end trace 137c54d525305f1c ]---
      >
      > The problem is with the following sequence of events:
      >
      > - CPU A calls smp_call_function_mask() for CPU B with wait parameter
      > - CPU A sets up the call_function_data on the stack and does an rcu add to
      >   call_function_queue
      > - CPU A waits until the WAIT flag is cleared
      > - CPU B gets the call function interrupt and starts going through the
      >   call_function_queue
      > - CPU C also gets some other call function interrupt and starts going through
      >   the call_function_queue
      > - CPU C, which is also going through the call_function_queue, starts referencing
      >   CPU A's stack, as that element is still in call_function_queue
      > - CPU B finishes the function call that CPU A set up and as there are no other
      >   references to it, rcu deletes the call_function_data (which was from CPU A
      >   stack)
      > - CPU B sees the wait flag and just clears the flag (no call_rcu to free)
      > - CPU A which was waiting on the flag continues executing and the stack
      >   contents change
      >
      > - CPU C is still in rcu_read section accessing the CPU A's stack sees
      >   inconsistent call_funation_data and can try to execute
      >   function with some random pointer, causing stack corruption for A
      >   (by clearing the bits in mask field) and oops.
      
      Nice debugging work.
      
      I'd suggest something like the attached (boot tested) patch as the simple
      fix for now.
      
      I expect the benefits from the less synchronized, multiple-in-flight-data
      global queue will still outweigh the costs of dynamic allocations. But
      if worst comes to worst then we just go back to a globally synchronous
      one-at-a-time implementation, but that would be pretty sad!
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc7a486c
  14. 27 7月, 2008 1 次提交
  15. 16 7月, 2008 1 次提交
  16. 27 6月, 2008 1 次提交
  17. 26 6月, 2008 2 次提交