1. 10 5月, 2007 12 次提交
    • O
      workqueue: make cancel_rearming_delayed_workqueue() work on idle dwork · dfb4b82e
      Oleg Nesterov 提交于
      cancel_rearming_delayed_workqueue(dwork) will hang forever if dwork was not
      scheduled, because in that case cancel_delayed_work()->del_timer_sync() never
      returns true.
      
      I don't know if there are any callers which may have problems, but this is not
      so convenient, and the fix is very simple.
      
      Q: looks like we don't need "struct workqueue_struct *wq" parameter.  If the
      timer was aborted successfully, get_wq_data() == wq.  Is it worth to add the
      new function?
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfb4b82e
    • O
      workqueue: don't save interrupts in run_workqueue() · f293ea92
      Oleg Nesterov 提交于
      work->func() may sleep, it's a bug to call run_workqueue() with irqs disabled.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f293ea92
    • O
      workqueue: kill run_scheduled_work() · 7097a87a
      Oleg Nesterov 提交于
      Because it has no callers.
      
      Actually, I think the whole idea of run_scheduled_work() was not right, not
      good to mix "unqueue this work and execute its ->func()" in one function.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7097a87a
    • O
      workqueue: don't migrate pending works from the dead CPU · 3af24433
      Oleg Nesterov 提交于
      Currently CPU_DEAD uses kthread_stop() to stop cwq->thread and then
      transfers cwq->worklist to another CPU.  However, it is very unlikely that
      worker_thread() will notice kthread_should_stop() before flushing
      cwq->worklist.  It is only possible if worker_thread() was preempted after
      run_workqueue(cwq), a new work_struct was added, and CPU_DEAD happened
      before cwq->thread has a chance to run.
      
      This means that take_over_work() mostly adds unneeded complications.  Note
      also that kthread_stop() is not good per se, wake_up_process() may confuse
      work->func() if it sleeps waiting for some event.
      
      Remove take_over_work() and migrate_sequence complications.  CPU_DEAD sets
      the cwq->should_stop flag (introduced by this patch) and waits for
      cwq->thread to flush cwq->worklist and exit.  Because the dead CPU is not
      on cpu_online_map, no more works can be added to that cwq.
      
      cpu_populated_map was introduced to optimize for_each_possible_cpu(), it is
      not strictly needed, and it is more a documentation in fact.
      
      Saves 418 bytes.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3af24433
    • O
      workqueue: don't clear cwq->thread until it exits · 36aa9dfc
      Oleg Nesterov 提交于
      Pointed out by Srivatsa Vaddagiri.
      
      cleanup_workqueue_thread() sets cwq->thread = NULL and does kthread_stop().
      This breaks the "if (cwq->thread == current)" logic in flush_cpu_workqueue()
      and leads to deadlock.
      
      Kill the thead first, then clear cwq->thread. workqueue_mutex protects us
      from create_workqueue_thread() so we don't need cwq->lock.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36aa9dfc
    • O
      workqueue: fix flush_workqueue() vs CPU_DEAD race · d721304d
      Oleg Nesterov 提交于
      Many thanks to Srivatsa Vaddagiri for the helpful discussion and for spotting
      the bug in my previous attempt.
      
      work->func() (and thus flush_workqueue()) must not use workqueue_mutex,
      this leads to deadlock when CPU_DEAD does kthread_stop(). However without
      this mutex held we can't detect CPU_DEAD in progress, which can move pending
      works to another CPU while the dead one is not on cpu_online_map.
      
      Change flush_workqueue() to use for_each_possible_cpu(). This means that
      flush_cpu_workqueue() may hit CPU which is already dead. However in that
      case
      
      	!list_empty(&cwq->worklist) || cwq->current_work != NULL
      
      means that CPU_DEAD in progress, it will do kthread_stop() + take_over_work()
      so we can proceed and insert a barrier. We hold cwq->lock, so we are safe.
      
      Also, add migrate_sequence incremented by take_over_work() under cwq->lock.
      If take_over_work() happened before we checked this CPU, we should see the
      new value after spin_unlock().
      
      Further possible changes:
      
      	remove CPU_DEAD handling (along with take_over_work, migrate_sequence)
      	from workqueue.c. CPU_DEAD just sets cwq->please_exit_after_flush flag.
      
      	CPU_UP_PREPARE->create_workqueue_thread() clears this flag, and creates
      	the new thread if cwq->thread == NULL.
      
      This way the workqueue/cpu-hotplug interaction is almost zero, workqueue_mutex
      just protects "workqueues" list, CPU_LOCK_ACQUIRE/CPU_LOCK_RELEASE go away.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d721304d
    • O
      workqueue: fix freezeable workqueues implementation · 319c2a98
      Oleg Nesterov 提交于
      Currently ->freezeable is per-cpu, this is wrong. CPU_UP_PREPARE creates
      cwq->thread which is not freezeable. Move ->freezeable to workqueue_struct.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: "Pallipadi, Venkatesh" <venkatesh.pallipadi@intel.com>
      Cc: Gautham shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      319c2a98
    • O
      flush_cpu_workqueue: don't flush an empty ->worklist · 83c22520
      Oleg Nesterov 提交于
      Now when we have ->current_work we can avoid adding a barrier and waiting
      for its completition when cwq's queue is empty.
      
      Note: this change is also useful if we change flush_workqueue() to also
      check the dead CPUs.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c22520
    • A
      flush_workqueue(): use preempt_disable to hold off cpu hotplug · edab2516
      Andrew Morton 提交于
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Gautham Shenoy <ego@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      edab2516
    • O
      implement flush_work() · b89deed3
      Oleg Nesterov 提交于
      A basic problem with flush_scheduled_work() is that it blocks behind _all_
      presently-queued works, rather than just the work whcih the caller wants to
      flush.  If the caller holds some lock, and if one of the queued work happens
      to want that lock as well then accidental deadlocks can occur.
      
      One example of this is the phy layer: it wants to flush work while holding
      rtnl_lock().  But if a linkwatch event happens to be queued, the phy code will
      deadlock because the linkwatch callback function takes rtnl_lock.
      
      So we implement a new function which will flush a *single* work - just the one
      which the caller wants to free up.  Thus we avoid the accidental deadlocks
      which can arise from unrelated subsystems' callbacks taking shared locks.
      
      flush_work() non-blockingly dequeues the work_struct which we want to kill,
      then it waits for its handler to complete on all CPUs.
      
      Add ->current_work to the "struct cpu_workqueue_struct", it points to
      currently running "struct work_struct". When flush_work(work) detects
      ->current_work == work, it inserts a barrier at the _head_ of ->worklist
      (and thus right _after_ that work) and waits for completition. This means
      that the next work fired on that CPU will be this barrier, or another
      barrier queued by concurrent flush_work(), so the caller of flush_work()
      will be woken before any "regular" work has a chance to run.
      
      When wait_on_work() unlocks workqueue_mutex (or whatever we choose to protect
      against CPU hotplug), CPU may go away. But in that case take_over_work() will
      move a barrier we queued to another CPU, it will be fired sometime, and
      wait_on_work() will be woken.
      
      Actually, we are doing cleanup_workqueue_thread()->kthread_stop() before
      take_over_work(), so cwq->thread should complete its ->worklist (and thus
      the barrier), because currently we don't check kthread_should_stop() in
      run_workqueue(). But even if we did, everything should be ok.
      
      [akpm@osdl.org: cleanup]
      [akpm@osdl.org: add flush_work_keventd() wrapper]
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b89deed3
    • O
      reimplement flush_workqueue() · fc2e4d70
      Oleg Nesterov 提交于
      Remove ->remove_sequence, ->insert_sequence, and ->work_done from struct
      cpu_workqueue_struct.  To implement flush_workqueue() we can queue a
      barrier work on each CPU and wait for its completition.
      
      The barrier is queued under workqueue_mutex to ensure that per cpu
      wq->cpu_wq is alive, we drop this mutex before going to sleep.  If CPU goes
      down while we are waiting for completition, take_over_work() will move the
      barrier on another CPU, and the handler will wake up us eventually.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fc2e4d70
    • A
      schedule_on_each_cpu(): use preempt_disable() · e18f3ffb
      Andrew Morton 提交于
      We take workqueue_mutex in there to keep CPU hotplug away.  But
      preempt_disable() will suffice for that.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e18f3ffb
  2. 17 2月, 2007 1 次提交
    • I
      [PATCH] Add debugging feature /proc/timer_stat · 82f67cd9
      Ingo Molnar 提交于
      Add /proc/timer_stats support: debugging feature to profile timer expiration.
      Both the starting site, process/PID and the expiration function is captured.
      This allows the quick identification of timer event sources in a system.
      
      Sample output:
      
      # echo 1 > /proc/timer_stats
      # cat /proc/timer_stats
      Timer Stats Version: v0.1
      Sample period: 4.010 s
        24,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
        11,     0 swapper          sk_reset_timer (tcp_delack_timer)
         6,     0 swapper          hrtimer_stop_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
        17,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         2,     1 swapper          queue_delayed_work_on (delayed_work_timer_fn)
         4,  2050 pcscd            do_nanosleep (hrtimer_wakeup)
         5,  4179 sshd             sk_reset_timer (tcp_write_timer)
         4,  2248 yum-updatesd     schedule_timeout (process_timeout)
        18,     0 swapper          hrtimer_restart_sched_tick (hrtimer_sched_tick)
         3,     0 swapper          sk_reset_timer (tcp_delack_timer)
         1,     1 swapper          neigh_table_init_no_netlink (neigh_periodic_timer)
         2,     1 swapper          e1000_up (e1000_watchdog)
         1,     1 init             schedule_timeout (process_timeout)
      100 total events, 25.24 events/sec
      
      [ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
      [bunk@stusta.de: nr_entries can become static]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82f67cd9
  3. 12 2月, 2007 1 次提交
  4. 23 12月, 2006 1 次提交
  5. 21 12月, 2006 1 次提交
  6. 17 12月, 2006 1 次提交
    • L
      Make workqueue bit operations work on "atomic_long_t" · a08727ba
      Linus Torvalds 提交于
      On architectures where the atomicity of the bit operations is handled by
      external means (ie a separate spinlock to protect concurrent accesses),
      just doing a direct assignment on the workqueue data field (as done by
      commit 4594bf15) can cause the
      assignment to be lost due to lack of serialization with the bitops on
      the same word.
      
      So we need to serialize the assignment with the locks on those
      architectures (notably older ARM chips, PA-RISC and sparc32).
      
      So rather than using an "unsigned long", let's use "atomic_long_t",
      which already has a safe assignment operation (atomic_long_set()) on
      such architectures.
      
      This requires that the atomic operations use the same atomicity locks as
      the bit operations do, but that is largely the case anyway.  Sparc32
      will probably need fixing.
      
      Architectures (including modern ARM with LL/SC) that implement sane
      atomic operations for SMP won't see any of this matter.
      
      Cc: Russell King <rmk+lkml@arm.linux.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David Miller <davem@davemloft.com>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Linux Arch Maintainers <linux-arch@vger.kernel.org>
      Cc: Andrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a08727ba
  7. 10 12月, 2006 1 次提交
    • D
      [PATCH] WorkStruct: Use direct assignment rather than cmpxchg() · 4594bf15
      David Howells 提交于
      Use direct assignment rather than cmpxchg() as the latter is unavailable
      and unimplementable on some platforms and is actually unnecessary.
      
      The use of cmpxchg() was to guard against two possibilities, neither of
      which can actually occur:
      
       (1) The pending flag may have been unset or may be cleared.  However, given
           where it's called, the pending flag is _always_ set.  I don't think it
           can be unset whilst we're in set_wq_data().
      
           Once the work is enqueued to be actually run, the only way off the queue
           is for it to be actually run.
      
           If it's a delayed work item, then the bit can't be cleared by the timer
           because we haven't started the timer yet.  Also, the pending bit can't be
           cleared by cancelling the delayed work _until_ the work item has had its
           timer started.
      
       (2) The workqueue pointer might change.  This can only happen in two cases:
      
           (a) The work item has just been queued to actually run, and so we're
               protected by the appropriate workqueue spinlock.
      
           (b) A delayed work item is being queued, and so the timer hasn't been
           	 started yet, and so no one else knows about the work item or can
           	 access it (the pending bit protects us).
      
           Besides, set_wq_data() _sets_ the workqueue pointer unconditionally, so
           it can be assigned instead.
      
      So, replacing the set_wq_data() with a straight assignment would be okay
      in most cases.
      
      The problem is where we end up tangling with test_and_set_bit() emulated
      using spinlocks, and even then it's not a problem _provided_
      test_and_set_bit() doesn't attempt to modify the word if the bit was
      set.
      
      If that's a problem, then a bitops-proofed assignment will be required -
      equivalent to atomic_set() vs other atomic_xxx() ops.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4594bf15
  8. 08 12月, 2006 4 次提交
  9. 22 11月, 2006 4 次提交
    • D
      WorkStruct: Pass the work_struct pointer instead of context data · 65f27f38
      David Howells 提交于
      Pass the work_struct pointer to the work function rather than context data.
      The work function can use container_of() to work out the data.
      
      For the cases where the container of the work_struct may go away the moment the
      pending bit is cleared, it is made possible to defer the release of the
      structure by deferring the clearing of the pending bit.
      
      To make this work, an extra flag is introduced into the management side of the
      work_struct.  This governs auto-release of the structure upon execution.
      
      Ordinarily, the work queue executor would release the work_struct for further
      scheduling or deallocation by clearing the pending bit prior to jumping to the
      work function.  This means that, unless the driver makes some guarantee itself
      that the work_struct won't go away, the work function may not access anything
      else in the work_struct or its container lest they be deallocated..  This is a
      problem if the auxiliary data is taken away (as done by the last patch).
      
      However, if the pending bit is *not* cleared before jumping to the work
      function, then the work function *may* access the work_struct and its container
      with no problems.  But then the work function must itself release the
      work_struct by calling work_release().
      
      In most cases, automatic release is fine, so this is the default.  Special
      initiators exist for the non-auto-release case (ending in _NAR).
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      65f27f38
    • D
      WorkStruct: Merge the pending bit into the wq_data pointer · 365970a1
      David Howells 提交于
      Reclaim a word from the size of the work_struct by folding the pending bit and
      the wq_data pointer together.  This shouldn't cause misalignment problems as
      all pointers should be at least 4-byte aligned.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      365970a1
    • D
      WorkStruct: Typedef the work function prototype · 6bb49e59
      David Howells 提交于
      Define a type for the work function prototype.  It's not only kept in the
      work_struct struct, it's also passed as an argument to several functions.
      
      This makes it easier to change it.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      6bb49e59
    • D
      WorkStruct: Separate delayable and non-delayable events. · 52bad64d
      David Howells 提交于
      Separate delayable work items from non-delayable work items be splitting them
      into a separate structure (delayed_work), which incorporates a work_struct and
      the timer_list removed from work_struct.
      
      The work_struct struct is huge, and this limits it's usefulness.  On a 64-bit
      architecture it's nearly 100 bytes in size.  This reduces that by half for the
      non-delayable type of event.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      52bad64d
  10. 29 10月, 2006 1 次提交
  11. 12 10月, 2006 1 次提交
  12. 04 10月, 2006 1 次提交
  13. 15 8月, 2006 1 次提交
  14. 01 8月, 2006 1 次提交
  15. 04 7月, 2006 1 次提交
  16. 30 6月, 2006 2 次提交
  17. 28 6月, 2006 1 次提交
    • C
      [PATCH] cpu hotplug: revert init patch submitted for 2.6.17 · 9c7b216d
      Chandra Seetharaman 提交于
      In 2.6.17, there was a problem with cpu_notifiers and XFS.  I provided a
      band-aid solution to solve that problem.  In the process, i undid all the
      changes you both were making to ensure that these notifiers were available
      only at init time (unless CONFIG_HOTPLUG_CPU is defined).
      
      We deferred the real fix to 2.6.18.  Here is a set of patches that fixes the
      XFS problem cleanly and makes the cpu notifiers available only at init time
      (unless CONFIG_HOTPLUG_CPU is defined).
      
      If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run
      time.
      
      This patch reverts the notifier_call changes made in 2.6.17
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9c7b216d
  18. 26 6月, 2006 2 次提交
  19. 23 6月, 2006 1 次提交
  20. 26 4月, 2006 1 次提交
  21. 28 2月, 2006 1 次提交
    • J
      [SCSI] add execute_in_process_context() API · 1fa44eca
      James Bottomley 提交于
      We have several points in the SCSI stack (primarily for our device
      functions) where we need to guarantee process context, but (given the
      place where the last reference was released) we cannot guarantee this.
      
      This API gets around the issue by executing the function directly if
      the caller has process context, but scheduling a workqueue to execute
      in process context if the caller doesn't have it.
      Signed-off-by: NJames Bottomley <James.Bottomley@SteelEye.com>
      1fa44eca