1. 25 1月, 2013 14 次提交
    • T
      workqueue: remove worker_pool->gcwq · 4e8f0a60
      Tejun Heo 提交于
      The only remaining user of pool->gcwq is std_worker_pool_pri().
      Reimplement it using get_gcwq() and remove worker_pool->gcwq.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      4e8f0a60
    • T
      workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() · 38db41d9
      Tejun Heo 提交于
      for_each_std_worker_pool() takes @cpu instead of @gcwq.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      38db41d9
    • T
      workqueue: make freezing/thawing per-pool · a1056305
      Tejun Heo 提交于
      Instead of holding locks from both pools and then processing the pools
      together, make freezing/thwaing per-pool - grab locks of one pool,
      process it, release it and then proceed to the next pool.
      
      While this patch changes processing order across pools, order within
      each pool remains the same.  As each pool is independent, this
      shouldn't break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      a1056305
    • T
      workqueue: make hotplug processing per-pool · 94cf58bb
      Tejun Heo 提交于
      Instead of holding locks from both pools and then processing the pools
      together, make hotplug processing per-pool - grab locks of one pool,
      process it, release it and then proceed to the next pool.
      
      rebind_workers() is updated to take and process @pool instead of @gcwq
      which results in a lot of de-indentation.  gcwq_claim_assoc_and_lock()
      and its counterpart are replaced with in-line per-pool locking.
      
      While this patch changes processing order across pools, order within
      each pool remains the same.  As each pool is independent, this
      shouldn't break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      94cf58bb
    • T
      workqueue: move global_cwq->lock to worker_pool · d565ed63
      Tejun Heo 提交于
      Move gcwq->lock to pool->lock.  The conversion is mostly
      straight-forward.  Things worth noting are
      
      * In many places, this removes the need to use gcwq completely.  pool
        is used directly instead.  get_std_worker_pool() is added to help
        some of these conversions.  This also leaves get_work_gcwq() without
        any user.  Removed.
      
      * In hotplug and freezer paths, the pools belonging to a CPU are often
        processed together.  This patch makes those paths hold locks of all
        pools, with highpri lock nested inside, to keep the conversion
        straight-forward.  These nested lockings will be removed by
        following patches.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      d565ed63
    • T
      workqueue: move global_cwq->cpu to worker_pool · ec22ca5e
      Tejun Heo 提交于
      Move gcwq->cpu to pool->cpu.  This introduces a couple places where
      gcwq->pools[0].cpu is used.  These will soon go away as gcwq is
      further reduced.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      ec22ca5e
    • T
      workqueue: move busy_hash from global_cwq to worker_pool · c9e7cf27
      Tejun Heo 提交于
      There's no functional necessity for the two pools on the same CPU to
      share the busy hash table.  It's also likely to be a bottleneck when
      implementing pools with user-specified attributes.
      
      This patch makes busy_hash per-pool.  The conversion is mostly
      straight-forward.  Changes worth noting are,
      
      * Large block of changes in rebind_workers() is moving the block
        inside for_each_worker_pool() as now there are separate hash tables
        for each pool.  This changes the order of operations but doesn't
        break anything.
      
      * Thre for_each_worker_pool() loops in gcwq_unbind_fn() are combined
        into one.  This again changes the order of operaitons but doesn't
        break anything.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      c9e7cf27
    • T
      workqueue: record pool ID instead of CPU in work->data when off-queue · 7c3eed5c
      Tejun Heo 提交于
      Currently, when a work item is off-queue, work->data records the CPU
      it was last on, which is used to locate the last executing instance
      for non-reentrance, flushing, etc.
      
      We're in the process of removing global_cwq and making worker_pool the
      top level abstraction.  This patch makes work->data point to the pool
      it was last associated with instead of CPU.
      
      After the previous WORK_OFFQ_POOL_CPU and worker_poo->id additions,
      the conversion is fairly straight-forward.  WORK_OFFQ constants and
      functions are modified to record and read back pool ID instead.
      worker_pool_by_id() is added to allow looking up pool from ID.
      get_work_pool() replaces get_work_gcwq(), which is reimplemented using
      get_work_pool().  get_work_pool_id() replaces work_cpu().
      
      This patch shouldn't introduce any observable behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      7c3eed5c
    • T
      workqueue: add worker_pool->id · 9daf9e67
      Tejun Heo 提交于
      Add worker_pool->id which is allocated from worker_pool_idr.  This
      will be used to record the last associated worker_pool in work->data.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      9daf9e67
    • T
      workqueue: introduce WORK_OFFQ_CPU_NONE · 715b06b8
      Tejun Heo 提交于
      Currently, when a work item is off queue, high bits of its data
      encodes the last CPU it was on.  This is scheduled to be changed to
      pool ID, which will make it impossible to use WORK_CPU_NONE to
      indicate no association.
      
      This patch limits the number of bits which are used for off-queue cpu
      number to 31 (so that the max fits in an int) and uses the highest
      possible value - WORK_OFFQ_CPU_NONE - to indicate no association.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      715b06b8
    • T
      workqueue: make GCWQ_FREEZING a pool flag · 35b6bb63
      Tejun Heo 提交于
      Make GCWQ_FREEZING a pool flag POOL_FREEZING.  This patch doesn't
      change locking - FREEZING on both pools of a CPU are set or clear
      together while holding gcwq->lock.  It shouldn't cause any functional
      difference.
      
      This leaves gcwq->flags w/o any flags.  Removed.
      
      While at it, convert BUG_ON()s in freeze_workqueue_begin() and
      thaw_workqueues() to WARN_ON_ONCE().
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      35b6bb63
    • T
      workqueue: make GCWQ_DISASSOCIATED a pool flag · 24647570
      Tejun Heo 提交于
      Make GCWQ_DISASSOCIATED a pool flag POOL_DISASSOCIATED.  This patch
      doesn't change locking - DISASSOCIATED on both pools of a CPU are set
      or clear together while holding gcwq->lock.  It shouldn't cause any
      functional difference.
      
      This is part of an effort to remove global_cwq and make worker_pool
      the top level abstraction, which in turn will help implementing worker
      pools with user-specified attributes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      24647570
    • T
      workqueue: use std_ prefix for the standard per-cpu pools · e34cdddb
      Tejun Heo 提交于
      There are currently two worker pools per cpu (including the unbound
      cpu) and they are the only pools in use.  New class of pools are
      scheduled to be added and some pool related APIs will be added
      inbetween.  Call the existing pools the standard pools and prefix them
      with std_.  Do this early so that new APIs can use std_ prefix from
      the beginning.
      
      This patch doesn't introduce any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      e34cdddb
    • T
      workqueue: unexport work_cpu() · e2905b29
      Tejun Heo 提交于
      This function no longer has any external users.  Unexport it.  It will
      be removed later on.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      e2905b29
  2. 19 1月, 2013 2 次提交
  3. 18 1月, 2013 1 次提交
    • T
      workqueue: set PF_WQ_WORKER on rescuers · 111c225a
      Tejun Heo 提交于
      PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
      worker and needs wq_worker_sleeping/waking_up() invoked on it for
      concurrency management.  As rescuers never participate in concurrency
      management, PF_WQ_WORKER wasn't set on them.
      
      There's a need for an interface which can query whether %current is
      executing a work item and if so which.  Such interface requires a way
      to identify all tasks which may execute work items and PF_WQ_WORKER
      will be used for that.  As all normal workers always have PF_WQ_WORKER
      set, we only need to add it to rescuers.
      
      As rescuers start with WORKER_PREP but never clear it, it's always
      NOT_RUNNING and there's no need to worry about it interfering with
      concurrency management even if PF_WQ_WORKER is set; however, unlike
      normal workers, rescuers currently don't have its worker struct as
      kthread_data().  It uses the associated workqueue_struct instead.
      This is problematic as wq_worker_sleeping/waking_up() expect struct
      worker at kthread_data().
      
      This patch adds worker->rescue_wq and start rescuer kthreads with
      worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      111c225a
  4. 20 12月, 2012 1 次提交
    • T
      workqueue: fix find_worker_executing_work() brekage from hashtable conversion · 023f27d3
      Tejun Heo 提交于
      42f8570f ("workqueue: use new hashtable implementation") incorrectly
      made busy workers hashed by the pointer value of worker instead of
      work.  This broke find_worker_executing_work() which in turn broke a
      lot of fundamental operations of workqueue - non-reentrancy and
      flushing among others.  The flush malfunction triggered warning in
      disk event code in Fengguang's automated test.
      
       write_dev_root_ (3265) used greatest stack depth: 2704 bytes left
       ------------[ cut here ]------------
       WARNING: at /c/kernel-tests/src/stable/block/genhd.c:1574 disk_clear_events+0x\
      cf/0x108()
       Hardware name: Bochs
       Modules linked in:
       Pid: 3328, comm: ata_id Not tainted 3.7.0-01930-gbff6343 #1167
       Call Trace:
        [<ffffffff810997c4>] warn_slowpath_common+0x83/0x9c
        [<ffffffff810997f7>] warn_slowpath_null+0x1a/0x1c
        [<ffffffff816aea77>] disk_clear_events+0xcf/0x108
        [<ffffffff811bd8be>] check_disk_change+0x27/0x59
        [<ffffffff822e48e2>] cdrom_open+0x49/0x68b
        [<ffffffff81ab0291>] idecd_open+0x88/0xb7
        [<ffffffff811be58f>] __blkdev_get+0x102/0x3ec
        [<ffffffff811bea08>] blkdev_get+0x18f/0x30f
        [<ffffffff811bebfd>] blkdev_open+0x75/0x80
        [<ffffffff8118f510>] do_dentry_open+0x1ea/0x295
        [<ffffffff8118f5f0>] finish_open+0x35/0x41
        [<ffffffff8119c720>] do_last+0x878/0xa25
        [<ffffffff8119c993>] path_openat+0xc6/0x333
        [<ffffffff8119cf37>] do_filp_open+0x38/0x86
        [<ffffffff81190170>] do_sys_open+0x6c/0xf9
        [<ffffffff8119021e>] sys_open+0x21/0x23
        [<ffffffff82c1c3d9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      023f27d3
  5. 19 12月, 2012 2 次提交
    • T
      workqueue: consider work function when searching for busy work items · a2c1c57b
      Tejun Heo 提交于
      To avoid executing the same work item concurrenlty, workqueue hashes
      currently busy workers according to their current work items and looks
      up the the table when it wants to execute a new work item.  If there
      already is a worker which is executing the new work item, the new item
      is queued to the found worker so that it gets executed only after the
      current execution finishes.
      
      Unfortunately, a work item may be freed while being executed and thus
      recycled for different purposes.  If it gets recycled for a different
      work item and queued while the previous execution is still in
      progress, workqueue may make the new work item wait for the old one
      although the two aren't really related in any way.
      
      In extreme cases, this false dependency may lead to deadlock although
      it's extremely unlikely given that there aren't too many self-freeing
      work item users and they usually don't wait for other work items.
      
      To alleviate the problem, record the current work function in each
      busy worker and match it together with the work item address in
      find_worker_executing_work().  While this isn't complete, it ensures
      that unrelated work items don't interact with each other and in the
      very unlikely case where a twisted wq user triggers it, it's always
      onto itself making the culprit easy to spot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndrey Isakov <andy51@gmx.ru>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51701
      Cc: stable@vger.kernel.org
      a2c1c57b
    • S
      workqueue: use new hashtable implementation · 42f8570f
      Sasha Levin 提交于
      Switch workqueues to use the new hashtable implementation. This reduces the
      amount of generic unrelated code in the workqueues.
      
      This patch depends on d9b482c8 ("hashtable: introduce a small and naive
      hashtable") which was merged in v3.6.
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      42f8570f
  6. 04 12月, 2012 1 次提交
    • T
      workqueue: convert BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s · fc4b514f
      Tejun Heo 提交于
      8852aac2 ("workqueue: mod_delayed_work_on() shouldn't queue timer on
      0 delay") unexpectedly uncovered a very nasty abuse of delayed_work in
      megaraid - it allocated work_struct, casted it to delayed_work and
      then pass that into queue_delayed_work().
      
      Previously, this was okay because 0 @delay short-circuited to
      queue_work() before doing anything with delayed_work.  8852aac2
      moved 0 @delay test into __queue_delayed_work() after sanity check on
      delayed_work making megaraid trigger BUG_ON().
      
      Although megaraid is already fixed by c1d390d8 ("megaraid: fix
      BUG_ON() from incorrect use of delayed work"), this patch converts
      BUG_ON()s in __queue_delayed_work() to WARN_ON_ONCE()s so that such
      abusers, if there are more, trigger warning but don't crash the
      machine.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Xiaotian Feng <xtfeng@gmail.com>
      fc4b514f
  7. 02 12月, 2012 4 次提交
    • J
      workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up() · 36576000
      Joonsoo Kim 提交于
      Recently, workqueue code has gone through some changes and we found
      some bugs related to concurrency management operations happening on
      the wrong CPU.  When a worker is concurrency managed
      (!WORKER_NOT_RUNNIG), it should be bound to its associated cpu and
      woken up to that cpu.  Add WARN_ON_ONCE() to verify this.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      36576000
    • J
      workqueue: trivial fix for return statement in work_busy() · 999767be
      Joonsoo Kim 提交于
      Return type of work_busy() is unsigned int.
      There is return statement returning boolean value, 'false' in work_busy().
      It is not problem, because 'false' may be treated '0'.
      However, fixing it would make code robust.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      999767be
    • T
      workqueue: mod_delayed_work_on() shouldn't queue timer on 0 delay · 8852aac2
      Tejun Heo 提交于
      8376fe22 ("workqueue: implement mod_delayed_work[_on]()")
      implemented mod_delayed_work[_on]() using the improved
      try_to_grab_pending().  The function is later used, among others, to
      replace [__]candel_delayed_work() + queue_delayed_work() combinations.
      
      Unfortunately, a delayed_work item w/ zero @delay is handled slightly
      differently by mod_delayed_work_on() compared to
      queue_delayed_work_on().  The latter skips timer altogether and
      directly queues it using queue_work_on() while the former schedules
      timer which will expire on the closest tick.  This means, when @delay
      is zero, that [__]cancel_delayed_work() + queue_delayed_work_on()
      makes the target item immediately executable while
      mod_delayed_work_on() may induce delay of upto a full tick.
      
      This somewhat subtle difference breaks some of the converted users.
      e.g. block queue plugging uses delayed_work for deferred processing
      and uses mod_delayed_work_on() when the queue needs to be immediately
      unplugged.  The above problem manifested as noticeably higher number
      of context switches under certain circumstances.
      
      The difference in behavior was caused by missing special case handling
      for 0 delay in mod_delayed_work_on() compared to
      queue_delayed_work_on().  Joonsoo Kim posted a patch to add it -
      ("workqueue: optimize mod_delayed_work_on() when @delay == 0")[1].
      The patch was queued for 3.8 but it was described as optimization and
      I missed that it was a correctness issue.
      
      As both queue_delayed_work_on() and mod_delayed_work_on() use
      __queue_delayed_work() for queueing, it seems that the better approach
      is to move the 0 delay special handling to the function instead of
      duplicating it in mod_delayed_work_on().
      
      Fix the problem by moving 0 delay special case handling from
      queue_delayed_work_on() to __queue_delayed_work().  This replaces
      Joonsoo's patch.
      
      [1] http://thread.gmane.org/gmane.linux.kernel/1379011/focus=1379012Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NAnders Kaseorg <andersk@MIT.EDU>
      Reported-and-tested-by: NZlatko Calusic <zlatko.calusic@iskon.hr>
      LKML-Reference: <alpine.DEB.2.00.1211280953350.26602@dr-wily.mit.edu>
      LKML-Reference: <50A78AA9.5040904@iskon.hr>
      Cc: Joonsoo Kim <js1304@gmail.com>
      8852aac2
    • M
      workqueue: exit rescuer_thread() as TASK_RUNNING · 412d32e6
      Mike Galbraith 提交于
      A rescue thread exiting TASK_INTERRUPTIBLE can lead to a task scheduling
      off, never to be seen again.  In the case where this occurred, an exiting
      thread hit reiserfs homebrew conditional resched while holding a mutex,
      bringing the box to its knees.
      
      PID: 18105  TASK: ffff8807fd412180  CPU: 5   COMMAND: "kdmflush"
       #0 [ffff8808157e7670] schedule at ffffffff8143f489
       #1 [ffff8808157e77b8] reiserfs_get_block at ffffffffa038ab2d [reiserfs]
       #2 [ffff8808157e79a8] __block_write_begin at ffffffff8117fb14
       #3 [ffff8808157e7a98] reiserfs_write_begin at ffffffffa0388695 [reiserfs]
       #4 [ffff8808157e7ad8] generic_perform_write at ffffffff810ee9e2
       #5 [ffff8808157e7b58] generic_file_buffered_write at ffffffff810eeb41
       #6 [ffff8808157e7ba8] __generic_file_aio_write at ffffffff810f1a3a
       #7 [ffff8808157e7c58] generic_file_aio_write at ffffffff810f1c88
       #8 [ffff8808157e7cc8] do_sync_write at ffffffff8114f850
       #9 [ffff8808157e7dd8] do_acct_process at ffffffff810a268f
          [exception RIP: kernel_thread_helper]
          RIP: ffffffff8144a5c0  RSP: ffff8808157e7f58  RFLAGS: 00000202
          RAX: 0000000000000000  RBX: 0000000000000000  RCX: 0000000000000000
          RDX: 0000000000000000  RSI: ffffffff8107af60  RDI: ffff8803ee491d18
          RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000000
          R10: 0000000000000000  R11: 0000000000000000  R12: 0000000000000000
          R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
      Signed-off-by: NMike Galbraith <mgalbraith@suse.de>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      412d32e6
  8. 25 10月, 2012 1 次提交
  9. 21 9月, 2012 1 次提交
  10. 20 9月, 2012 3 次提交
  11. 19 9月, 2012 8 次提交
    • L
      workqueue: remove @delayed from cwq_dec_nr_in_flight() · b3f9f405
      Lai Jiangshan 提交于
      @delayed is now always false for all callers, remove it.
      
      tj: Updated description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b3f9f405
    • L
      workqueue: fix possible stall on try_to_grab_pending() of a delayed work item · 3aa62497
      Lai Jiangshan 提交于
      Currently, when try_to_grab_pending() grabs a delayed work item, it
      leaves its linked work items alone on the delayed_works.  The linked
      work items are always NO_COLOR and will cause future
      cwq_activate_first_delayed() increase cwq->nr_active incorrectly, and
      may cause the whole cwq to stall.  For example,
      
      state: cwq->max_active = 1, cwq->nr_active = 1
             one work in cwq->pool, many in cwq->delayed_works.
      
      step1: try_to_grab_pending() removes a work item from delayed_works
             but leaves its NO_COLOR linked work items on it.
      
      step2: Later on, cwq_activate_first_delayed() activates the linked
             work item increasing ->nr_active.
      
      step3: cwq->nr_active = 1, but all activated work items of the cwq are
             NO_COLOR.  When they finish, cwq->nr_active will not be
             decreased due to NO_COLOR, and no further work items will be
             activated from cwq->delayed_works. the cwq stalls.
      
      Fix it by ensuring the target work item is activated before stealing
      PENDING in try_to_grab_pending().  This ensures that all the linked
      work items are activated without incorrectly bumping cwq->nr_active.
      
      tj: Updated comment and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@kernel.org
      3aa62497
    • L
      workqueue: use hotcpu_notifier() for workqueue_cpu_down_callback() · a5b4e57d
      Lai Jiangshan 提交于
      workqueue_cpu_down_callback() is used only if HOTPLUG_CPU=y, so
      hotcpu_notifier() fits better than cpu_notifier().
      
      When HOTPLUG_CPU=y, hotcpu_notifier() and cpu_notifier() are the same.
      
      When HOTPLUG_CPU=n, if we use cpu_notifier(),
      workqueue_cpu_down_callback() will be called during boot to do
      nothing, and the memory of workqueue_cpu_down_callback() and
      gcwq_unbind_fn() will be discarded after boot.
      
      If we use hotcpu_notifier(), we can avoid the no-op call of
      workqueue_cpu_down_callback() and the memory of
      workqueue_cpu_down_callback() and gcwq_unbind_fn() will be discard at
      build time:
      
      $ ls -l kernel/workqueue.o.cpu_notifier kernel/workqueue.o.hotcpu_notifier
      -rw-rw-r-- 1 laijs laijs 484080 Sep 15 11:31 kernel/workqueue.o.cpu_notifier
      -rw-rw-r-- 1 laijs laijs 478240 Sep 15 11:31 kernel/workqueue.o.hotcpu_notifier
      
      $ size kernel/workqueue.o.cpu_notifier kernel/workqueue.o.hotcpu_notifier
         text	   data	    bss	    dec	    hex	filename
        18513	   2387	   1221	  22121	   5669	kernel/workqueue.o.cpu_notifier
        18082	   2355	   1221	  21658	   549a	kernel/workqueue.o.hotcpu_notifier
      
      tj: Updated description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      a5b4e57d
    • L
      workqueue: use __cpuinit instead of __devinit for cpu callbacks · 9fdf9b73
      Lai Jiangshan 提交于
      For workqueue hotplug callbacks, it makes less sense to use __devinit
      which discards the memory after boot if !HOTPLUG.  __cpuinit, which
      discards the memory after boot if !HOTPLUG_CPU fits better.
      
      tj: Updated description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      9fdf9b73
    • L
      workqueue: rename manager_mutex to assoc_mutex · b2eb83d1
      Lai Jiangshan 提交于
      Now that manager_mutex's role has changed from synchronizing manager
      role to excluding hotplug against manager, the name is misleading.
      
      As it is protecting the CPU-association of the gcwq now, rename it to
      assoc_mutex.
      
      This patch is pure rename and doesn't introduce any functional change.
      
      tj: Updated comments and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b2eb83d1
    • L
      workqueue: WORKER_REBIND is no longer necessary for idle rebinding · 5f7dabfd
      Lai Jiangshan 提交于
      Now both worker destruction and idle rebinding remove the worker from
      idle list while it's still idle, so list_empty(&worker->entry) can be
      used to test whether either is pending and WORKER_DIE to distinguish
      between the two instead making WORKER_REBIND unnecessary.
      
      Use list_empty(&worker->entry) to determine whether destruction or
      rebinding is pending.  This simplifies worker state transitions.
      
      WORKER_REBIND is not needed anymore.  Remove it.
      
      tj: Updated comments and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      5f7dabfd
    • L
      workqueue: WORKER_REBIND is no longer necessary for busy rebinding · eab6d828
      Lai Jiangshan 提交于
      Because the old unbind/rebinding implementation wasn't atomic w.r.t.
      GCWQ_DISASSOCIATED manipulation which is protected by
      global_cwq->lock, we had to use two flags, WORKER_UNBOUND and
      WORKER_REBIND, to avoid incorrectly losing all NOT_RUNNING bits with
      back-to-back CPU hotplug operations; otherwise, completion of
      rebinding while another unbinding is in progress could clear UNBIND
      prematurely.
      
      Now that both unbind/rebinding are atomic w.r.t. GCWQ_DISASSOCIATED,
      there's no need to use two flags.  Just one is enough.  Don't use
      WORKER_REBIND for busy rebinding.
      
      tj: Updated description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      eab6d828
    • L
      workqueue: reimplement idle worker rebinding · ea1abd61
      Lai Jiangshan 提交于
      Currently rebind_workers() uses rebinds idle workers synchronously
      before proceeding to requesting busy workers to rebind.  This is
      necessary because all workers on @worker_pool->idle_list must be bound
      before concurrency management local wake-ups from the busy workers
      take place.
      
      Unfortunately, the synchronous idle rebinding is quite complicated.
      This patch reimplements idle rebinding to simplify the code path.
      
      Rather than trying to make all idle workers bound before rebinding
      busy workers, we simply remove all to-be-bound idle workers from the
      idle list and let them add themselves back after completing rebinding
      (successful or not).
      
      As only workers which finished rebinding can on on the idle worker
      list, the idle worker list is guaranteed to have only bound workers
      unless CPU went down again and local wake-ups are safe.
      
      After the change, @worker_pool->nr_idle may deviate than the actual
      number of idle workers on @worker_pool->idle_list.  More specifically,
      nr_idle may be non-zero while ->idle_list is empty.  All users of
      ->nr_idle and ->idle_list are audited.  The only affected one is
      too_many_workers() which is updated to check %false if ->idle_list is
      empty regardless of ->nr_idle.
      
      After this patch, rebind_workers() no longer performs the nasty
      idle-rebind retries which require temporary release of gcwq->lock, and
      both unbinding and rebinding are atomic w.r.t. global_cwq->lock.
      
      worker->idle_rebind and global_cwq->rebind_hold are now unnecessary
      and removed along with the definition of struct idle_rebind.
      
      Changed from V1:
      	1) remove unlikely from too_many_workers(), ->idle_list can be empty
      	   anytime, even before this patch, no reason to use unlikely.
      	2) fix a small rebasing mistake.
      	   (which is from rebasing the orignal fixing patch to for-next)
      	3) add a lot of comments.
      	4) clear WORKER_REBIND unconditionaly in idle_worker_rebind()
      
      tj: Updated comments and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ea1abd61
  12. 18 9月, 2012 1 次提交
    • L
      workqueue: always clear WORKER_REBIND in busy_worker_rebind_fn() · 960bd11b
      Lai Jiangshan 提交于
      busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed
      (CPU is down again).  This used to be okay because the flag wasn't
      used for anything else.
      
      However, after 25511a47 "workqueue: reimplement CPU online rebinding
      to handle idle workers", WORKER_REBIND is also used to command idle
      workers to rebind.  If not cleared, the worker may confuse the next
      CPU_UP cycle by having REBIND spuriously set or oops / get stuck by
      prematurely calling idle_worker_rebind().
      
        WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5
       00()
        Hardware name: Bochs
        Modules linked in: test_wq(O-)
        Pid: 33, comm: kworker/1:1 Tainted: G           O 3.6.0-rc1-work+ #3
        Call Trace:
         [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0
         [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20
         [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        ---[ end trace e977cf20f4661968 ]---
        BUG: unable to handle kernel NULL pointer dereference at           (null)
        IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500
        PGD 0
        Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
        Modules linked in: test_wq(O-)
        CPU 0
        Pid: 33, comm: kworker/1:1 Tainted: G        W  O 3.6.0-rc1-work+ #3 Bochs Bochs
        RIP: 0010:[<ffffffff810b3db0>]  [<ffffffff810b3db0>] worker_thread+0x360/0x500
        RSP: 0018:ffff88001e1c9de0  EFLAGS: 00010086
        RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140
        RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009
        RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001
        R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580
        R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900
        FS:  0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
        CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
        Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900)
        Stack:
         ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010
         ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900
         ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340
        Call Trace:
         [<ffffffff810bc16e>] kthread+0xbe/0xd0
         [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10
        Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b
        RIP  [<ffffffff810b3db0>] worker_thread+0x360/0x500
         RSP <ffff88001e1c9de0>
        CR2: 0000000000000000
      
      There was no reason to keep WORKER_REBIND on failure in the first
      place - WORKER_UNBOUND is guaranteed to be set in such cases
      preventing incorrectly activating concurrency management.  Always
      clear WORKER_REBIND.
      
      tj: Updated comment and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      960bd11b
  13. 11 9月, 2012 1 次提交
    • L
      workqueue: fix possible idle worker depletion across CPU hotplug · ee378aa4
      Lai Jiangshan 提交于
      To simplify both normal and CPU hotplug paths, worker management is
      prevented while CPU hoplug is in progress.  This is achieved by CPU
      hotplug holding the same exclusion mechanism used by workers to ensure
      there's only one manager per pool.
      
      If someone else seems to be performing the manager role, workers
      proceed to execute work items.  CPU hotplug using the same mechanism
      can lead to idle worker depletion because all workers could proceed to
      execute work items while CPU hotplug is in progress and CPU hotplug
      itself wouldn't actually perform the worker management duty - it
      doesn't guarantee that there's an idle worker left when it releases
      management.
      
      This idle worker depletion, under extreme circumstances, can break
      forward-progress guarantee and thus lead to deadlock.
      
      This patch fixes the bug by using separate mechanisms for manager
      exclusion among workers and hotplug exclusion.  For manager exclusion,
      POOL_MANAGING_WORKERS which was restored by the previous patch is
      used.  pool->manager_mutex is now only used for exclusion between the
      elected manager and CPU hotplug.  The elected manager won't proceed
      without holding pool->manager_mutex.
      
      This ensures that the worker which won the manager position can't skip
      managing while CPU hotplug is in progress.  It will block on
      manager_mutex and perform management after CPU hotplug is complete.
      
      Note that hotplug may happen while waiting for manager_mutex.  A
      manager isn't either on idle or busy list and thus the hoplug code
      can't unbind/rebind it.  Make the manager handle its own un/rebinding.
      
      tj: Updated comment and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ee378aa4