1. 10 5月, 2007 8 次提交
    • O
      workqueue: don't clear cwq->thread until it exits · 36aa9dfc
      Oleg Nesterov 提交于
      Pointed out by Srivatsa Vaddagiri.
      
      cleanup_workqueue_thread() sets cwq->thread = NULL and does kthread_stop().
      This breaks the "if (cwq->thread == current)" logic in flush_cpu_workqueue()
      and leads to deadlock.
      
      Kill the thead first, then clear cwq->thread. workqueue_mutex protects us
      from create_workqueue_thread() so we don't need cwq->lock.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36aa9dfc
    • O
      workqueue: fix flush_workqueue() vs CPU_DEAD race · d721304d
      Oleg Nesterov 提交于
      Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting
      the bug in my previous attempt.
      
      work->func() (and thus flush_workqueue()) must not use workqueue_mutex,
      this leads to deadlock when CPU_DEAD does kthread_stop(). However without
      this mutex held we can't detect CPU_DEAD in progress, which can move pending
      works to another CPU while the dead one is not on cpu_online_map.
      
      Change flush_workqueue() to use for_each_possible_cpu(). This means that
      flush_cpu_workqueue() may hit CPU which is already dead. However in that
      case
      
      	!list_empty(&cwq->worklist) || cwq->current_work != NULL
      
      means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work()
      so we can proceed and insert a barrier. We hold cwq->lock, so we are safe.
      
      Also, add migrate_sequence incremented by take_over_work() under cwq->lock.
      If take_over_work() happened before we checked this CPU, we should see the
      new value after spin_unlock().
      
      Further possible changes:
      
      	remove CPU_DEAD handling (along with take_over_work, migrate_sequence)
      	from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag.
      
      	CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates
      	the new thread if cwq->thread == NULL.
      
      This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex
      just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d721304d
    • O
      workqueue: fix freezeable workqueues implementation · 319c2a98
      Oleg Nesterov 提交于
      Currently ->freezeable is per-cpu, this is wrong. CPU_UP_PREPARE creates
      cwq->thread which is not freezeable. Move ->freezeable to workqueue_struct.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      319c2a98
    • O
      flush_cpu_workqueue: don't flush an empty ->worklist · 83c22520
      Oleg Nesterov 提交于
      Now when we have ->current_work we can avoid adding a barrier and waiting
      for its completition when cwq's queue is empty.
      
      Note: this change is also useful if we change flush_workqueue() to also
      check the dead CPUs.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c22520
    • A
      flush_workqueue(): use preempt_disable to hold off cpu hotplug · edab2516
      Andrew Morton 提交于
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edab2516
    • O
      implement flush_work() · b89deed3
      Oleg Nesterov 提交于
      A basic problem with flush_scheduled_work() is that it blocks behind _all_
      presently-queued works, rather than just the work whcih the caller wants to
      flush.  If the caller holds some lock, and if one of the queued work happens
      to want that lock as well then accidental deadlocks can occur.
      
      One example of this is the phy layer: it wants to flush work while holding
      rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
      deadlock because the linkwatch callback function takes rtnl_lock.
      
      So we implement a new function which will flush a *single* work - just the one
      which the caller wants to free up.  Thus we avoid the accidental deadlocks
      which can arise from unrelated subsystems' callbacks taking shared locks.
      
      flush_work() non-blockingly dequeues the work_struct which we want to kill,
      then it waits for its handler to complete on all CPUs.
      
      Add ->current_work to the "struct cpu_workqueue_struct", it points to
      currently running "struct work_struct". When flush_work(work) detects
      ->current_work == work, it inserts a barrier at the _head_ of ->worklist
      (and thus right _after_ that work) and waits for completition. This means
      that the next work fired on that CPU will be this barrier, or another
      barrier queued by concurrent flush_work(), so the caller of flush_work()
      will be woken before any "regular" work has a chance to run.
      
      When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
      against CPU hotplug), CPU may go away. But in that case take_over_work() will
      move a barrier we queued to another CPU, it will be fired sometime, and
      wait_on_work() will be woken.
      
      Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
      take_over_work(), so cwq->thread should complete its ->worklist (and thus
      the barrier), because currently we don't check kthread_should_stop() in
      run_workqueue(). But even if we did, everything should be ok.
      
      [akpm@osdl.org: cleanup]
      [akpm@osdl.org: add flush_work_keventd() wrapper]
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b89deed3
    • O
      reimplement flush_workqueue() · fc2e4d70
      Oleg Nesterov 提交于
      Remove ->remove_sequence, ->insert_sequence, and ->work_done from struct
      cpu_workqueue_struct.  To implement flush_workqueue() we can queue a
      barrier work on each CPU and wait for its completition.
      
      The barrier is queued under workqueue_mutex to ensure that per cpu
      wq->cpu_wq is alive, we drop this mutex before going to sleep.  If CPU goes
      down while we are waiting for completition, take_over_work() will move the
      barrier on another CPU, and the handler will wake up us eventually.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc2e4d70
    • A
      schedule_on_each_cpu(): use preempt_disable() · e18f3ffb
      Andrew Morton 提交于
      We take workqueue_mutex in there to keep CPU hotplug away.  But
      preempt_disable() will suffice for that.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18f3ffb
  2. 17 2月, 2007 1 次提交
    • I
      [PATCH] Add debugging feature /proc/timer_stat · 82f67cd9
      Ingo Molnar 提交于
      Add /proc/timer_stats support: debugging feature to profile timer expiration.
      Both the starting site, process/PID and the expiration function is captured.
      This allows the quick identification of timer event sources in a system.
      
      Sample output:
      
      # echo 1 > /proc/timer_stats
      # cat /proc/timer_stats
      Timer Stats Version: v0.1
      Sample period: 4.010 s
        24,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
        11,     0 swapper          sk_reset_timer (tcp_delack_timer)
         6,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
        17,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
         4,  2050 pcscd            do_nanosleep (hrtimer_wakeup)
         5,  4179 sshd             sk_reset_timer (tcp_write_timer)
         4,  2248 yum-updatesd     schedule_timeout (process_timeout)
        18,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         3,     0 swapper          sk_reset_timer (tcp_delack_timer)
         1,     1 swapper          neigh_table_init_no_netlink (neigh_periodic_timer)
         2,     1 swapper          e1000_up (e1000_watchdog)
         1,     1 init             schedule_timeout (process_timeout)
      100 total events, 25.24 events/sec
      
      [ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
      [bunk@stusta.de: nr_entries can become static]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82f67cd9
  3. 12 2月, 2007 1 次提交
  4. 23 12月, 2006 1 次提交
  5. 21 12月, 2006 1 次提交
  6. 17 12月, 2006 1 次提交
    • L
      Make workqueue bit operations work on "atomic_long_t" · a08727ba
      Linus Torvalds 提交于
      On architectures where the atomicity of the bit operations is handled by
      external means (ie a separate spinlock to protect concurrent accesses),
      just doing a direct assignment on the workqueue data field (as done by
      commit 4594bf15) can cause the
      assignment to be lost due to lack of serialization with the bitops on
      the same word.
      
      So we need to serialize the assignment with the locks on those
      architectures (notably older ARM chips, PA-RISC and sparc32).
      
      So rather than using an "unsigned long", let's use "atomic_long_t",
      which already has a safe assignment operation (atomic_long_set()) on
      such architectures.
      
      This requires that the atomic operations use the same atomicity locks as
      the bit operations do, but that is largely the case anyway.  Sparc32
      will probably need fixing.
      
      Architectures (including modern ARM with LL/SC) that implement sane
      atomic operations for SMP won't see any of this matter.
      
      Cc: Russell King <rmk+lkml@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David Miller <davem@davemloft.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a08727ba
  7. 10 12月, 2006 1 次提交
    • D
      [PATCH] WorkStruct: Use direct assignment rather than cmpxchg() · 4594bf15
      David Howells 提交于
      Use direct assignment rather than cmpxchg() as the latter is unavailable
      and unimplementable on some platforms and is actually unnecessary.
      
      The use of cmpxchg() was to guard against two possibilities, neither of
      which can actually occur:
      
       (1) The pending flag may have been unset or may be cleared.  However, given
           where it's called, the pending flag is _always_ set.  I don't think it
           can be unset whilst we're in set_wq_data().
      
           Once the work is enqueued to be actually run, the only way off the queue
           is for it to be actually run.
      
           If it's a delayed work item, then the bit can't be cleared by the timer
           because we haven't started the timer yet.  Also, the pending bit can't be
           cleared by cancelling the delayed work _until_ the work item has had its
           timer started.
      
       (2) The workqueue pointer might change.  This can only happen in two cases:
      
           (a) The work item has just been queued to actually run, and so we're
               protected by the appropriate workqueue spinlock.
      
           (b) A delayed work item is being queued, and so the timer hasn't been
           	 started yet, and so no one else knows about the work item or can
           	 access it (the pending bit protects us).
      
           Besides, set_wq_data() _sets_ the workqueue pointer unconditionally, so
           it can be assigned instead.
      
      So, replacing the set_wq_data() with a straight assignment would be okay
      in most cases.
      
      The problem is where we end up tangling with test_and_set_bit() emulated
      using spinlocks, and even then it's not a problem _provided_
      test_and_set_bit() doesn't attempt to modify the word if the bit was
      set.
      
      If that's a problem, then a bitops-proofed assignment will be required -
      equivalent to atomic_set() vs other atomic_xxx() ops.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4594bf15
  8. 08 12月, 2006 4 次提交
  9. 22 11月, 2006 4 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
    • D
      WorkStruct: Merge the pending bit into the wq_data pointer · 365970a1
      David Howells 提交于
      Reclaim a word from the size of the work_struct by folding the pending bit and
      the wq_data pointer together.  This shouldn't cause misalignment problems as
      all pointers should be at least 4-byte aligned.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      365970a1
    • D
      WorkStruct: Typedef the work function prototype · 6bb49e59
      David Howells 提交于
      Define a type for the work function prototype.  It's not only kept in the
      work_struct struct, it's also passed as an argument to several functions.
      
      This makes it easier to change it.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      6bb49e59
    • D
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells 提交于
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      52bad64d
  10. 29 10月, 2006 1 次提交
  11. 12 10月, 2006 1 次提交
  12. 04 10月, 2006 1 次提交
  13. 15 8月, 2006 1 次提交
  14. 01 8月, 2006 1 次提交
  15. 04 7月, 2006 1 次提交
  16. 30 6月, 2006 2 次提交
  17. 28 6月, 2006 1 次提交
    • C
      [PATCH] cpu hotplug: revert init patch submitted for 2.6.17 · 9c7b216d
      Chandra Seetharaman 提交于
      In 2.6.17, there was a problem with cpu_notifiers and XFS.  I provided a
      band-aid solution to solve that problem.  In the process, i undid all the
      changes you both were making to ensure that these notifiers were available
      only at init time (unless CONFIG_HOTPLUG_CPU is defined).
      
      We deferred the real fix to 2.6.18.  Here is a set of patches that fixes the
      XFS problem cleanly and makes the cpu notifiers available only at init time
      (unless CONFIG_HOTPLUG_CPU is defined).
      
      If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
      time.
      
      This patch reverts the notifier_call changes made in 2.6.17
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9c7b216d
  18. 26 6月, 2006 2 次提交
  19. 23 6月, 2006 1 次提交
  20. 26 4月, 2006 1 次提交
  21. 28 2月, 2006 1 次提交
    • J
      [SCSI] add execute_in_process_context() API · 1fa44eca
      James Bottomley 提交于
      We have several points in the SCSI stack (primarily for our device
      functions) where we need to guarantee process context, but (given the
      place where the last reference was released) we cannot guarantee this.
      
      This API gets around the issue by executing the function directly if
      the caller has process context, but scheduling a workqueue to execute
      in process context if the caller doesn't have it.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      1fa44eca
  22. 15 1月, 2006 1 次提交
  23. 09 1月, 2006 3 次提交
    • N
      [PATCH] fix workqueue oops during cpu offline · f756d5e2
      Nathan Lynch 提交于
      Use first_cpu(cpu_possible_map) for the single-thread workqueue case.  We
      used to hardcode 0, but that broke on systems where !cpu_possible(0) when
      workqueue_struct->cpu_workqueue_struct was changed from a static array to
      alloc_percpu.
      
      Commit id bce61dd4 ("Fix hardcoded cpu=0 in
      workqueue for per_cpu_ptr() calls") fixed that for Ben's funky sparc64
      system, but it regressed my Power5.  Offlining cpu 0 oopses upon the next
      call to queue_work for a single-thread workqueue, because now we try to
      manipulate per_cpu_ptr(wq->cpu_wq, 1), which is uninitialized.
      
      So we need to establish an unchanging "slot" for single-thread workqueues
      which will have a valid percpu allocation.  Since alloc_percpu keys off of
      cpu_possible_map, which must not change after initialization, make this
      slot == first_cpu(cpu_possible_map).
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f756d5e2
    • B
      [PATCH] Unchecked alloc_percpu() return in __create_workqueue() · 676121fc
      Ben Collins 提交于
      __create_workqueue() not checking return of alloc_percpu()
      
      NULL dereference was possible.
      Signed-off-by: NBen Collins <bcollins@ubuntu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      676121fc
    • C
      [PATCH] add schedule_on_each_cpu() · 15316ba8
      Christoph Lameter 提交于
      swap migration's isolate_lru_page() currently uses an IPI to notify other
      processors that the lru caches need to be drained if the page cannot be
      found on the LRU.  The IPI interrupt may interrupt a processor that is just
      processing lru requests and cause a race condition.
      
      This patch introduces a new function run_on_each_cpu() that uses the
      keventd() to run the LRU draining on each processor.  Processors disable
      preemption when dealing the LRU caches (these are per processor) and thus
      executing LRU draining from another process is safe.
      
      Thanks to Lee Schermerhorn <lee.schermerhorn@hp.com> for finding this race
      condition.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      15316ba8