1. 21 12月, 2010 1 次提交
    • T
      workqueue: allow chained queueing during destruction · c8efcc25
      Tejun Heo 提交于
      Currently, destroy_workqueue() makes the workqueue deny all new
      queueing by setting WQ_DYING and flushes the workqueue once before
      proceeding with destruction; however, there are cases where work items
      queue more related work items.  Currently, such users need to
      explicitly flush the workqueue multiple times depending on the
      possible depth of such chained queueing.
      
      This patch updates the queueing path such that a work item can queue
      further work items on the same workqueue even when WQ_DYING is set.
      The flush on destruction is automatically retried until the workqueue
      is empty.  This guarantees that the workqueue is empty on destruction
      while allowing chained queueing.
      
      The flush retry logic whines if it takes too many retries to drain the
      workqueue.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      c8efcc25
  2. 14 12月, 2010 1 次提交
    • S
      workqueue: It is likely that WORKER_NOT_RUNNING is true · 2d64672e
      Steven Rostedt 提交于
      Running the annotate branch profiler on three boxes, including my
      main box that runs firefox, evolution, xchat, and is part of the distcc farm,
      showed this with the likelys in the workqueue code:
      
       correct incorrect  %        Function                  File              Line
       ------- ---------  -        --------                  ----              ----
            96   996253  99 wq_worker_sleeping             workqueue.c          703
            96   996247  99 wq_worker_waking_up            workqueue.c          677
      
      The likely()s in this case were assuming that WORKER_NOT_RUNNING will
      most likely be false. But this is not the case. The reason is
      (and shown by adding trace_printks and testing it) that most of the time
      WORKER_PREP is set.
      
      In worker_thread() we have:
      
      	worker_clr_flags(worker, WORKER_PREP);
      
      	[ do work stuff ]
      
      	worker_set_flags(worker, WORKER_PREP, false);
      
      (that 'false' means not to wake up an idle worker)
      
      The wq_worker_sleeping() is called from schedule when a worker thread
      is putting itself to sleep. Which happens most of the time outside
      of that [ do work stuff ].
      
      The wq_worker_waking_up is called by the wakeup worker code, which
      is also callod outside that [ do work stuff ].
      
      Thus, the likely and unlikely used by those two functions are actually
      backwards.
      
      Remove the annotation and let gcc figure it out.
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      2d64672e
  3. 26 11月, 2010 1 次提交
  4. 27 10月, 2010 1 次提交
  5. 26 10月, 2010 1 次提交
    • D
      MN10300: Fix the PERCPU() alignment to allow for workqueues · 52605627
      David Howells 提交于
      In the MN10300 arch, we occasionally see an assertion being tripped in
      alloc_cwqs() at the following line:
      
              /* just in case, make sure it's actually aligned */
        --->  BUG_ON(!IS_ALIGNED(wq->cpu_wq.v, align));
              return wq->cpu_wq.v ? 0 : -ENOMEM;
      
      The values are:
      
              wa->cpu_wq.v => 0x902776e0
              align => 0x100
      
      and align is calculated by the following:
      
              const size_t align = max_t(size_t, 1 << WORK_STRUCT_FLAG_BITS,
                                         __alignof__(unsigned long long));
      
      This is because the pointer in question (wq->cpu_wq.v) loses some of its
      lower bits to control flags, and so the object it points to must be
      sufficiently aligned to avoid the need to use those bits for pointing to
      things.
      
      Currently, 4 control bits and 4 colour bits are used in normal
      circumstances, plus a debugging bit if debugging is set.  This requires
      the cpu_workqueue_struct struct to be at least 256 bytes aligned (or 512
      bytes aligned with debugging).
      
      PERCPU() alignment on MN13000, however, is only 32 bytes as set in
      vmlinux.lds.S.  So we set this to PAGE_SIZE (4096) to match most other
      arches and stick a comment in alloc_cwqs() for anyone else who triggers
      the assertion.
      Reported-by: NAkira Takeuchi <takeuchi.akr@jp.panasonic.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NMark Salter <msalter@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52605627
  6. 19 10月, 2010 2 次提交
  7. 11 10月, 2010 2 次提交
    • T
      workqueue: add and use WQ_MEM_RECLAIM flag · 6370a6ad
      Tejun Heo 提交于
      Add WQ_MEM_RECLAIM flag which currently maps to WQ_RESCUER, mark
      WQ_RESCUER as internal and replace all external WQ_RESCUER usages to
      WQ_MEM_RECLAIM.
      
      This makes the API users express the intent of the workqueue instead
      of indicating the internal mechanism used to guarantee forward
      progress.  This is also to make it cleaner to add more semantics to
      WQ_MEM_RECLAIM.  For example, if deemed necessary, memory reclaim
      workqueues can be made highpri.
      
      This patch doesn't introduce any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jeff Garzik <jgarzik@pobox.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      6370a6ad
    • T
      workqueue: fix HIGHPRI handling in keep_working() · 30310045
      Tejun Heo 提交于
      The policy function keep_working() didn't check GCWQ_HIGHPRI_PENDING
      and could return %false with highpri work pending.  This could lead to
      late execution of a highpri work which was delayed due to @max_active
      throttling if other works are actively consuming CPU cycles.
      
      For example, the following could happen.
      
      1. Work W0 which burns CPU cycles.
      
      2. Two works W1 and W2 are queued to a highpri wq w/ @max_active of 1.
      
      3. W1 starts executing and W2 is put to delayed queue.  W0 and W1 are
         both runnable.
      
      4. W1 finishes which puts W2 to pending queue but keep_working()
         incorrectly returns %false and the worker goes to sleep.
      
      5. W0 finishes and W2 starts execution.
      
      With this patch applied, W2 starts execution as soon as W1 finishes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      30310045
  8. 05 10月, 2010 2 次提交
    • T
      workqueue: add queue_work and activate_work trace points · cdadf009
      Tejun Heo 提交于
      These two tracepoints allow tracking when and how a work is queued and
      activated.  This patch is based on Frederic's patch to add queue_work
      trace point.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      cdadf009
    • T
      workqueue: prepare for more tracepoints · 97bd2347
      Tejun Heo 提交于
      Define workqueue_work event class and use it for workqueue_execute_end
      trace point.  Also, move trace/events/workqueue.h include downwards
      such that all struct definitions are visible to it.  This is to
      prepare for more tracepoints and doesn't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      97bd2347
  9. 19 9月, 2010 3 次提交
    • T
      workqueue: implement flush[_delayed]_work_sync() · 09383498
      Tejun Heo 提交于
      Implement flush[_delayed]_work_sync().  These are flush functions
      which also make sure no CPU is still executing the target work from
      earlier queueing instances.  These are similar to
      cancel[_delayed]_work_sync() except that the target work item is
      flushed instead of cancelled.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      09383498
    • T
      workqueue: factor out start_flush_work() · baf59022
      Tejun Heo 提交于
      Factor out start_flush_work() from flush_work().  start_flush_work()
      has @wait_executing argument which controls whether the barrier is
      queued only if the work is pending or also if executing.  As
      flush_work() needs to wait for execution too, it uses %true.
      
      This commit doesn't cause any behavior difference.  start_flush_work()
      will be used to implement flush_work_sync().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      baf59022
    • T
      workqueue: cleanup flush/cancel functions · 401a8d04
      Tejun Heo 提交于
      Make the following cleanup changes.
      
      * Relocate flush/cancel function prototypes and definitions.
      
      * Relocate wait_on_cpu_work() and wait_on_work() before
        try_to_grab_pending().  These will be used to implement
        flush_work_sync().
      
      * Make all flush/cancel functions return bool instead of int.
      
      * Update wait_on_cpu_work() and wait_on_work() to return %true if they
        actually waited.
      
      * Add / update comments.
      
      This patch doesn't cause any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      401a8d04
  10. 13 9月, 2010 1 次提交
  11. 31 8月, 2010 2 次提交
  12. 25 8月, 2010 2 次提交
    • T
      workqueue: fix cwq->nr_active underflow · 8a2e8e5d
      Tejun Heo 提交于
      cwq->nr_active is used to keep track of how many work items are active
      for the cpu workqueue, where 'active' is defined as either pending on
      global worklist or executing.  This is used to implement the
      max_active limit and workqueue freezing.  If a work item is queued
      after nr_active has already reached max_active, the work item doesn't
      increment nr_active and is put on the delayed queue and gets activated
      later as previous active work items retire.
      
      try_to_grab_pending() which is used in the cancellation path
      unconditionally decremented nr_active whether the work item being
      cancelled is currently active or delayed, so cancelling a delayed work
      item makes nr_active underflow.  This breaks max_active enforcement
      and triggers BUG_ON() in destroy_workqueue() later on.
      
      This patch fixes this bug by adding a flag WORK_STRUCT_DELAYED, which
      is set while a work item in on the delayed list and making
      try_to_grab_pending() decrement nr_active iff the work item is
      currently active.
      
      The addition of the flag enlarges cwq alignment to 256 bytes which is
      getting a bit too large.  It's scheduled to be reduced back to 128
      bytes by merging WORK_STRUCT_PENDING and WORK_STRUCT_CWQ in the next
      devel cycle.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJohannes Berg <johannes@sipsolutions.net>
      8a2e8e5d
    • T
      workqueue: improve destroy_workqueue() debuggability · e41e704b
      Tejun Heo 提交于
      Now that the worklist is global, having works pending after wq
      destruction can easily lead to oops and destroy_workqueue() have
      several BUG_ON()s to catch these cases.  Unfortunately, BUG_ON()
      doesn't tell much about how the work became pending after the final
      flush_workqueue().
      
      This patch adds WQ_DYING which is set before the final flush begins.
      If a work is requested to be queued on a dying workqueue,
      WARN_ON_ONCE() is triggered and the request is ignored.  This clearly
      indicates which caller is trying to queue a work on a dying workqueue
      and keeps the system working in most cases.
      
      Locking rule comment is updated such that the 'I' rule includes
      modifying the field from destruction path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e41e704b
  13. 23 8月, 2010 2 次提交
  14. 22 8月, 2010 1 次提交
    • A
      workqueue: Add basic tracepoints to track workqueue execution · e36c886a
      Arjan van de Ven 提交于
      With the introduction of the new unified work queue thread pools,
      we lost one feature: It's no longer possible to know which worker
      is causing the CPU to wake out of idle. The result is that PowerTOP
      now reports a lot of "kworker/a:b" instead of more readable results.
      
      This patch adds a pair of tracepoints to the new workqueue code,
      similar in style to the timer/hrtimer tracepoints.
      
      With this pair of tracepoints, the next PowerTOP can correctly
      report which work item caused the wakeup (and how long it took):
      
      Interrupt (43)            i915      time   3.51ms    wakeups 141
      Work      ieee80211_iface_work      time   0.81ms    wakeups  29
      Work              do_dbs_timer      time   0.55ms    wakeups  24
      Process                   Xorg      time  21.36ms    wakeups   4
      Timer    sched_rt_period_timer      time   0.01ms    wakeups   1
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e36c886a
  15. 16 8月, 2010 1 次提交
  16. 09 8月, 2010 1 次提交
  17. 08 8月, 2010 1 次提交
  18. 01 8月, 2010 2 次提交
    • S
      workqueue: mark init_workqueues() as early_initcall() · 6ee0578b
      Suresh Siddha 提交于
      Mark init_workqueues() as early_initcall() and thus it will be initialized
      before smp bringup. init_workqueues() registers for the hotcpu notifier
      and thus it should cope with the processors that are brought online after
      the workqueues are initialized.
      
      x86 smp bringup code uses workqueues and uses a workaround for the
      cold boot process (as the workqueues are initialized post smp_init()).
      Marking init_workqueues() as early_initcall() will pave the way for
      cleaning up this code.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      6ee0578b
    • T
      workqueue: explain for_each_*cwq_cpu() iterators · 09884951
      Tejun Heo 提交于
      for_each_*cwq_cpu() are similar to regular CPU iterators except that
      it also considers the pseudo CPU number used for unbound workqueues.
      Explain them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      09884951
  19. 23 7月, 2010 1 次提交
  20. 20 7月, 2010 2 次提交
    • T
      workqueue: fix mayday_mask handling on UP · f2e005aa
      Tejun Heo 提交于
      All cpumasks are assumed to have cpu 0 permanently set on UP, so it
      can't be used to signify whether there's something to be done for the
      CPU.  workqueue was using cpumask to track which CPU requested rescuer
      assistance and this led rescuer thread to think there always are
      pending mayday requests on UP, which resulted in infinite busy loops.
      
      This patch fixes the problem by introducing mayday_mask_t and
      associated helpers which wrap cpumask on SMP and emulates its behavior
      using bitops and unsigned long on UP.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      f2e005aa
    • T
      workqueue: fix build problem on !CONFIG_SMP · 931ac77e
      Tejun Heo 提交于
      Commit f3421797 (workqueue: implement unbound workqueue) incorrectly
      tested CONFIG_SMP as part of a C expression in alloc/free_cwqs().  As
      CONFIG_SMP is not defined in UP, this breaks build.  Fix it by using
      
      Found during linux-next build test.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      931ac77e
  21. 14 7月, 2010 1 次提交
  22. 02 7月, 2010 7 次提交
    • T
      workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead · c7fc77f7
      Tejun Heo 提交于
      WQ_SINGLE_CPU combined with @max_active of 1 is used to achieve full
      ordering among works queued to a workqueue.  The same can be achieved
      using WQ_UNBOUND as unbound workqueues always use the gcwq for
      WORK_CPU_UNBOUND.  As @max_active is always one and benefits from cpu
      locality isn't accessible anyway, serving them with unbound workqueues
      should be fine.
      
      Drop WQ_SINGLE_CPU support and use WQ_UNBOUND instead.  Note that most
      single thread workqueue users will be converted to use multithread or
      non-reentrant instead and only the ones which require strict ordering
      will keep using WQ_UNBOUND + @max_active of 1.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      c7fc77f7
    • T
      workqueue: implement unbound workqueue · f3421797
      Tejun Heo 提交于
      This patch implements unbound workqueue which can be specified with
      WQ_UNBOUND flag on creation.  An unbound workqueue has the following
      properties.
      
      * It uses a dedicated gcwq with a pseudo CPU number WORK_CPU_UNBOUND.
        This gcwq is always online and disassociated.
      
      * Workers are not bound to any CPU and not concurrency managed.  Works
        are dispatched to workers as soon as possible and the only applied
        limitation is @max_active.  IOW, all unbound workqeueues are
        implicitly high priority.
      
      Unbound workqueues can be used as simple execution context provider.
      Contexts unbound to any cpu are served as soon as possible.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: David Howells <dhowells@redhat.com>
      f3421797
    • T
      workqueue: prepare for WQ_UNBOUND implementation · bdbc5dd7
      Tejun Heo 提交于
      In preparation of WQ_UNBOUND addition, make the following changes.
      
      * Add WORK_CPU_* constants for pseudo cpu id numbers used (currently
        only WORK_CPU_NONE) and use them instead of NR_CPUS.  This is to
        allow another pseudo cpu id for unbound cpu.
      
      * Reorder WQ_* flags.
      
      * Make workqueue_struct->cpu_wq a union which contains a percpu
        pointer, regular pointer and an unsigned long value and use
        kzalloc/kfree() in UP allocation path.  This will be used to
        implement unbound workqueues which will use only one cwq on SMPs.
      
      * Move alloc_cwqs() allocation after initialization of wq fields, so
        that alloc_cwqs() has access to wq->flags.
      
      * Trivial relocation of wq local variables in freeze functions.
      
      These changes don't cause any functional change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bdbc5dd7
    • T
      workqueue: fix worker management invocation without pending works · d313dd85
      Tejun Heo 提交于
      When there's no pending work to do, worker_thread() goes back to sleep
      after waking up without checking whether worker management is
      necessary.  This means that idle worker exit requests can be ignored
      if the gcwq stays empty.
      
      Fix it by making worker_thread() always check whether worker
      management is necessary before going to sleep.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d313dd85
    • T
      workqueue: fix incorrect cpu number BUG_ON() in get_work_gcwq() · a1e453d2
      Tejun Heo 提交于
      get_work_gcwq() was incorrectly triggering BUG_ON() if cpu number is
      equal to or higher than num_possible_cpus() instead of nr_cpu_ids.
      Fix it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a1e453d2
    • T
      workqueue: fix race condition in flush_workqueue() · 4ce48b37
      Tejun Heo 提交于
      When one flusher is cascading to the next flusher, it first sets
      wq->first_flusher to the next one and sets up the next flush cycle.
      If there's nothing to do for the next cycle, it clears
      wq->flush_flusher and proceeds to the one after that.
      
      If the woken up flusher checks wq->first_flusher before it gets
      cleared, it will incorrectly assume the role of the first flusher,
      which triggers BUG_ON() sanity check.
      
      Fix it by checking wq->first_flusher again after grabbing the mutex.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4ce48b37
    • T
      workqueue: use worker_set/clr_flags() only from worker itself · cb444766
      Tejun Heo 提交于
      worker_set/clr_flags() assume that if none of NOT_RUNNING flags is set
      the worker must be contributing to nr_running which is only true if
      the worker is actually running.
      
      As when called from self, it is guaranteed that the worker is running,
      those functions can be safely used from the worker itself and they
      aren't necessary from other places anyway.  Make the following changes
      to fix the bug.
      
      * Make worker_set/clr_flags() whine if not called from self.
      
      * Convert all places which called those functions from other tasks to
        manipulate flags directly.
      
      * Make trustee_thread() directly clear nr_running after setting
        WORKER_ROGUE on all workers.  This is the only place where
        nr_running manipulation is necessary outside of workers themselves.
      
      * While at it, add sanity check for nr_running in worker_enter_idle().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cb444766
  23. 29 6月, 2010 2 次提交
    • T
      workqueue: implement cpu intensive workqueue · fb0e7beb
      Tejun Heo 提交于
      This patch implements cpu intensive workqueue which can be specified
      with WQ_CPU_INTENSIVE flag on creation.  Works queued to a cpu
      intensive workqueue don't participate in concurrency management.  IOW,
      it doesn't contribute to gcwq->nr_running and thus doesn't delay
      excution of other works.
      
      Note that although cpu intensive works won't delay other works, they
      can be delayed by other works.  Combine with WQ_HIGHPRI to avoid being
      delayed by other works too.
      
      As the name suggests this is useful when using workqueue for cpu
      intensive works.  Workers executing cpu intensive works are not
      considered for workqueue concurrency management and left for the
      scheduler to manage.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      fb0e7beb
    • T
      workqueue: implement high priority workqueue · 649027d7
      Tejun Heo 提交于
      This patch implements high priority workqueue which can be specified
      with WQ_HIGHPRI flag on creation.  A high priority workqueue has the
      following properties.
      
      * A work queued to it is queued at the head of the worklist of the
        respective gcwq after other highpri works, while normal works are
        always appended at the end.
      
      * As long as there are highpri works on gcwq->worklist,
        [__]need_more_worker() remains %true and process_one_work() wakes up
        another worker before it start executing a work.
      
      The above two properties guarantee that works queued to high priority
      workqueues are dispatched to workers and start execution as soon as
      possible regardless of the state of other works.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      649027d7