1. 11 9月, 2012 2 次提交
    • L
      workqueue: fix possible idle worker depletion across CPU hotplug · ee378aa4
      Lai Jiangshan 提交于
      To simplify both normal and CPU hotplug paths, worker management is
      prevented while CPU hoplug is in progress.  This is achieved by CPU
      hotplug holding the same exclusion mechanism used by workers to ensure
      there's only one manager per pool.
      
      If someone else seems to be performing the manager role, workers
      proceed to execute work items.  CPU hotplug using the same mechanism
      can lead to idle worker depletion because all workers could proceed to
      execute work items while CPU hotplug is in progress and CPU hotplug
      itself wouldn't actually perform the worker management duty - it
      doesn't guarantee that there's an idle worker left when it releases
      management.
      
      This idle worker depletion, under extreme circumstances, can break
      forward-progress guarantee and thus lead to deadlock.
      
      This patch fixes the bug by using separate mechanisms for manager
      exclusion among workers and hotplug exclusion.  For manager exclusion,
      POOL_MANAGING_WORKERS which was restored by the previous patch is
      used.  pool->manager_mutex is now only used for exclusion between the
      elected manager and CPU hotplug.  The elected manager won't proceed
      without holding pool->manager_mutex.
      
      This ensures that the worker which won the manager position can't skip
      managing while CPU hotplug is in progress.  It will block on
      manager_mutex and perform management after CPU hotplug is complete.
      
      Note that hotplug may happen while waiting for manager_mutex.  A
      manager isn't either on idle or busy list and thus the hoplug code
      can't unbind/rebind it.  Make the manager handle its own un/rebinding.
      
      tj: Updated comment and description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ee378aa4
    • L
      workqueue: restore POOL_MANAGING_WORKERS · 552a37e9
      Lai Jiangshan 提交于
      This patch restores POOL_MANAGING_WORKERS which was replaced by
      pool->manager_mutex by 60373152 "workqueue: use mutex for global_cwq
      manager exclusion".
      
      There's a subtle idle worker depletion bug across CPU hotplug events
      and we need to distinguish an actual manager and CPU hotplug
      preventing management.  POOL_MANAGING_WORKERS will be used for the
      former and manager_mutex the later.
      
      This patch just lays POOL_MANAGING_WORKERS on top of the existing
      manager_mutex and doesn't introduce any synchronization changes.  The
      next patch will update it.
      
      Note that this patch fixes a non-critical anomaly where
      too_many_workers() may return %true spuriously while CPU hotplug is in
      progress.  While the issue could schedule idle timer spuriously, it
      didn't trigger any actual misbehavior.
      
      tj: Rewrote patch description.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      552a37e9
  2. 06 9月, 2012 2 次提交
    • T
      workqueue: fix possible deadlock in idle worker rebinding · ec58815a
      Tejun Heo 提交于
      Currently, rebind_workers() and idle_worker_rebind() are two-way
      interlocked.  rebind_workers() waits for idle workers to finish
      rebinding and rebound idle workers wait for rebind_workers() to finish
      rebinding busy workers before proceeding.
      
      Unfortunately, this isn't enough.  The second wait from idle workers
      is implemented as follows.
      
      	wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
      
      rebind_workers() clears WORKER_REBIND, wakes up the idle workers and
      then returns.  If CPU hotplug cycle happens again before one of the
      idle workers finishes the above wait_event(), rebind_workers() will
      repeat the first part of the handshake - set WORKER_REBIND again and
      wait for the idle worker to finish rebinding - and this leads to
      deadlock because the idle worker would be waiting for WORKER_REBIND to
      clear.
      
      This is fixed by adding another interlocking step at the end -
      rebind_workers() now waits for all the idle workers to finish the
      above WORKER_REBIND wait before returning.  This ensures that all
      rebinding steps are complete on all idle workers before the next
      hotplug cycle can happen.
      
      This problem was diagnosed by Lai Jiangshan who also posted a patch to
      fix the issue, upon which this patch is based.
      
      This is the minimal fix and further patches are scheduled for the next
      merge window to simplify the CPU hotplug path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Original-patch-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>
      ec58815a
    • T
      workqueue: move WORKER_REBIND clearing in rebind_workers() to the end of the function · 90beca5d
      Tejun Heo 提交于
      This doesn't make any functional difference and is purely to help the
      next patch to be simpler.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      90beca5d
  3. 05 9月, 2012 1 次提交
    • L
      workqueue: UNBOUND -> REBIND morphing in rebind_workers() should be atomic · 96e65306
      Lai Jiangshan 提交于
      The compiler may compile the following code into TWO write/modify
      instructions.
      
      	worker->flags &= ~WORKER_UNBOUND;
      	worker->flags |= WORKER_REBIND;
      
      so the other CPU may temporarily see worker->flags which doesn't have
      either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup
      prematurely.
      
      Fix it by using single explicit assignment via ACCESS_ONCE().
      
      Because idle workers have another WORKER_NOT_RUNNING flag, this bug
      doesn't exist for them; however, update it to use the same pattern for
      consistency.
      
      tj: Applied the change to idle workers too and updated comments and
          patch description a bit.
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      96e65306
  4. 23 7月, 2012 1 次提交
    • T
      workqueue: fix spurious CPU locality WARN from process_one_work() · 6fec10a1
      Tejun Heo 提交于
      25511a47 "workqueue: reimplement CPU online rebinding to handle idle
      workers" added CPU locality sanity check in process_one_work().  It
      triggers if a worker is executing on a different CPU without UNBOUND
      or REBIND set.
      
      This works for all normal workers but rescuers can trigger this
      spuriously when they're serving the unbound or a disassociated
      global_cwq - rescuers don't have either flag set and thus its
      gcwq->cpu can be a different value including %WORK_CPU_UNBOUND.
      
      Fix it by additionally testing %GCWQ_DISASSOCIATED.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      LKML-Refence: <20120721213656.GA7783@linux.vnet.ibm.com>
      6fec10a1
  5. 18 7月, 2012 9 次提交
    • T
      workqueue: simplify CPU hotplug code · 8db25e78
      Tejun Heo 提交于
      With trustee gone, CPU hotplug code can be simplified.
      
      * gcwq_claim/release_management() now grab and release gcwq lock too
        respectively and gained _and_lock and _and_unlock postfixes.
      
      * All CPU hotplug logic was implemented in workqueue_cpu_callback()
        which was called by workqueue_cpu_up/down_callback() for the correct
        priority.  This was because up and down paths shared a lot of logic,
        which is no longer true.  Remove workqueue_cpu_callback() and move
        all hotplug logic into the two actual callbacks.
      
      This patch doesn't make any functional changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      8db25e78
    • T
      workqueue: remove CPU offline trustee · 628c78e7
      Tejun Heo 提交于
      With the previous changes, a disassociated global_cwq now can run as
      an unbound one on its own - it can create workers as necessary to
      drain remaining works after the CPU has been brought down and manage
      the number of workers using the usual idle timer mechanism making
      trustee completely redundant except for the actual unbinding
      operation.
      
      This patch removes the trustee and let a disassociated global_cwq
      manage itself.  Unbinding is moved to a work item (for CPU affinity)
      which is scheduled and flushed from CPU_DONW_PREPARE.
      
      This patch moves nr_running clearing outside gcwq and manager locks to
      simplify the code.  As nr_running is unused at the point, this is
      safe.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      628c78e7
    • T
      workqueue: don't butcher idle workers on an offline CPU · 3ce63377
      Tejun Heo 提交于
      Currently, during CPU offlining, after all pending work items are
      drained, the trustee butchers all workers.  Also, on CPU onlining
      failure, workqueue_cpu_callback() ensures that the first idle worker
      is destroyed.  Combined, these guarantee that an offline CPU doesn't
      have any worker for it once all the lingering work items are finished.
      
      This guarantee isn't really necessary and makes CPU on/offlining more
      expensive than needs to be, especially for platforms which use CPU
      hotplug for powersaving.
      
      This patch lets offline CPUs removes idle worker butchering from the
      trustee and let a CPU which failed onlining keep the created first
      worker.  The first worker is created if the CPU doesn't have any
      during CPU_DOWN_PREPARE and started right away.  If onlining succeeds,
      the rebind_workers() call in CPU_ONLINE will rebind it like any other
      workers.  If onlining fails, the worker is left alone till the next
      try.
      
      This makes CPU hotplugs cheaper by allowing global_cwqs to keep
      workers across them and simplifies code.
      
      Note that trustee doesn't re-arm idle timer when it's done and thus
      the disassociated global_cwq will keep all workers until it comes back
      online.  This will be improved by further patches.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      3ce63377
    • T
      workqueue: reimplement CPU online rebinding to handle idle workers · 25511a47
      Tejun Heo 提交于
      Currently, if there are left workers when a CPU is being brough back
      online, the trustee kills all idle workers and scheduled rebind_work
      so that they re-bind to the CPU after the currently executing work is
      finished.  This works for busy workers because concurrency management
      doesn't try to wake up them from scheduler callbacks, which require
      the target task to be on the local run queue.  The busy worker bumps
      concurrency counter appropriately as it clears WORKER_UNBOUND from the
      rebind work item and it's bound to the CPU before returning to the
      idle state.
      
      To reduce CPU on/offlining overhead (as many embedded systems use it
      for powersaving) and simplify the code path, workqueue is planned to
      be modified to retain idle workers across CPU on/offlining.  This
      patch reimplements CPU online rebinding such that it can also handle
      idle workers.
      
      As noted earlier, due to the local wakeup requirement, rebinding idle
      workers is tricky.  All idle workers must be re-bound before scheduler
      callbacks are enabled.  This is achieved by interlocking idle
      re-binding.  Idle workers are requested to re-bind and then hold until
      all idle re-binding is complete so that no bound worker starts
      executing work item.  Only after all idle workers are re-bound and
      parked, CPU_ONLINE proceeds to release them and queue rebind work item
      to busy workers thus guaranteeing scheduler callbacks aren't invoked
      until all idle workers are ready.
      
      worker_rebind_fn() is renamed to busy_worker_rebind_fn() and
      idle_worker_rebind() for idle workers is added.  Rebinding logic is
      moved to rebind_workers() and now called from CPU_ONLINE after
      flushing trustee.  While at it, add CPU sanity check in
      worker_thread().
      
      Note that now a worker may become idle or the manager between trustee
      release and rebinding during CPU_ONLINE.  As the previous patch
      updated create_worker() so that it can be used by regular manager
      while unbound and this patch implements idle re-binding, this is safe.
      
      This prepares for removal of trustee and keeping idle workers across
      CPU hotplugs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      25511a47
    • T
      workqueue: drop @bind from create_worker() · bc2ae0f5
      Tejun Heo 提交于
      Currently, create_worker()'s callers are responsible for deciding
      whether the newly created worker should be bound to the associated CPU
      and create_worker() sets WORKER_UNBOUND only for the workers for the
      unbound global_cwq.  Creation during normal operation is always via
      maybe_create_worker() and @bind is true.  For workers created during
      hotplug, @bind is false.
      
      Normal operation path is planned to be used even while the CPU is
      going through hotplug operations or offline and this static decision
      won't work.
      
      Drop @bind from create_worker() and decide whether to bind by looking
      at GCWQ_DISASSOCIATED.  create_worker() will also set WORKER_UNBOUND
      autmatically if disassociated.  To avoid flipping GCWQ_DISASSOCIATED
      while create_worker() is in progress, the flag is now allowed to be
      changed only while holding all manager_mutexes on the global_cwq.
      
      This requires that GCWQ_DISASSOCIATED is not cleared behind trustee's
      back.  CPU_ONLINE no longer clears DISASSOCIATED before flushing
      trustee, which clears DISASSOCIATED before rebinding remaining workers
      if asked to release.  For cases where trustee isn't around, CPU_ONLINE
      clears DISASSOCIATED after flushing trustee.  Also, now, first_idle
      has UNBOUND set on creation which is explicitly cleared by CPU_ONLINE
      while binding it.  These convolutions will soon be removed by further
      simplification of CPU hotplug path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      bc2ae0f5
    • T
      workqueue: use mutex for global_cwq manager exclusion · 60373152
      Tejun Heo 提交于
      POOL_MANAGING_WORKERS is used to ensure that at most one worker takes
      the manager role at any given time on a given global_cwq.  Trustee
      later hitched on it to assume manager adding blocking wait for the
      bit.  As trustee already needed a custom wait mechanism, waiting for
      MANAGING_WORKERS was rolled into the same mechanism.
      
      Trustee is scheduled to be removed.  This patch separates out
      MANAGING_WORKERS wait into per-pool mutex.  Workers use
      mutex_trylock() to test for manager role and trustee uses mutex_lock()
      to claim manager roles.
      
      gcwq_claim/release_management() helpers are added to grab and release
      manager roles of all pools on a global_cwq.  gcwq_claim_management()
      always grabs pool manager mutexes in ascending pool index order and
      uses pool index as lockdep subclass.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      60373152
    • T
      workqueue: ROGUE workers are UNBOUND workers · 403c821d
      Tejun Heo 提交于
      Currently, WORKER_UNBOUND is used to mark workers for the unbound
      global_cwq and WORKER_ROGUE is used to mark workers for disassociated
      per-cpu global_cwqs.  Both are used to make the marked worker skip
      concurrency management and the only place they make any difference is
      in worker_enter_idle() where WORKER_ROGUE is used to skip scheduling
      idle timer, which can easily be replaced with trustee state testing.
      
      This patch replaces WORKER_ROGUE with WORKER_UNBOUND and drops
      WORKER_ROGUE.  This is to prepare for removing trustee and handling
      disassociated global_cwqs as unbound.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      403c821d
    • T
      workqueue: drop CPU_DYING notifier operation · f2d5a0ee
      Tejun Heo 提交于
      Workqueue used CPU_DYING notification to mark GCWQ_DISASSOCIATED.
      This was necessary because workqueue's CPU_DOWN_PREPARE happened
      before other DOWN_PREPARE notifiers and workqueue needed to stay
      associated across the rest of DOWN_PREPARE.
      
      After the previous patch, workqueue's DOWN_PREPARE happens after
      others and can set GCWQ_DISASSOCIATED directly.  Drop CPU_DYING and
      let the trustee set GCWQ_DISASSOCIATED after disabling concurrency
      management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      f2d5a0ee
    • T
      workqueue: perform cpu down operations from low priority cpu_notifier() · 65758202
      Tejun Heo 提交于
      Currently, all workqueue cpu hotplug operations run off
      CPU_PRI_WORKQUEUE which is higher than normal notifiers.  This is to
      ensure that workqueue is up and running while bringing up a CPU before
      other notifiers try to use workqueue on the CPU.
      
      Per-cpu workqueues are supposed to remain working and bound to the CPU
      for normal CPU_DOWN_PREPARE notifiers.  This holds mostly true even
      with workqueue offlining running with higher priority because
      workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
      runs the per-cpu workqueue without concurrency management without
      explicitly detaching the existing workers.
      
      However, if the trustee needs to create new workers, it creates
      unbound workers which may wander off to other CPUs while
      CPU_DOWN_PREPARE notifiers are in progress.  Furthermore, if the CPU
      down is cancelled, the per-CPU workqueue may end up with workers which
      aren't bound to the CPU.
      
      While reliably reproducible with a convoluted artificial test-case
      involving scheduling and flushing CPU burning work items from CPU down
      notifiers, this isn't very likely to happen in the wild, and, even
      when it happens, the effects are likely to be hidden by the following
      successful CPU down.
      
      Fix it by using different priorities for up and down notifiers - high
      priority for up operations and low priority for down operations.
      
      Workqueue cpu hotplug operations will soon go through further cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      65758202
  6. 14 7月, 2012 2 次提交
    • T
      workqueue: reimplement WQ_HIGHPRI using a separate worker_pool · 3270476a
      Tejun Heo 提交于
      WQ_HIGHPRI was implemented by queueing highpri work items at the head
      of the global worklist.  Other than queueing at the head, they weren't
      handled differently; unfortunately, this could lead to execution
      latency of a few seconds on heavily loaded systems.
      
      Now that workqueue code has been updated to deal with multiple
      worker_pools per global_cwq, this patch reimplements WQ_HIGHPRI using
      a separate worker_pool.  NR_WORKER_POOLS is bumped to two and
      gcwq->pools[0] is used for normal pri work items and ->pools[1] for
      highpri.  Highpri workers get -20 nice level and has 'H' suffix in
      their names.  Note that this change increases the number of kworkers
      per cpu.
      
      POOL_HIGHPRI_PENDING, pool_determine_ins_pos() and highpri chain
      wakeup code in process_one_work() are no longer used and removed.
      
      This allows proper prioritization of highpri work items and removes
      high execution latency of highpri work items.
      
      v2: nr_running indexing bug in get_pool_nr_running() fixed.
      
      v3: Refreshed for the get_pool_nr_running() update in the previous
          patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJosh Hunt <joshhunt00@gmail.com>
      LKML-Reference: <CAKA=qzaHqwZ8eqpLNFjxnO2fX-tgAOjmpvxgBFjv6dJeQaOW1w@mail.gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      3270476a
    • T
      workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool() · 4ce62e9e
      Tejun Heo 提交于
      Introduce NR_WORKER_POOLS and for_each_worker_pool() and convert code
      paths which need to manipulate all pools in a gcwq to use them.
      NR_WORKER_POOLS is currently one and for_each_worker_pool() iterates
      over only @gcwq->pool.
      
      Note that nr_running is per-pool property and converted to an array
      with NR_WORKER_POOLS elements and renamed to pool_nr_running.  Note
      that get_pool_nr_running() currently assumes 0 index.  The next patch
      will make use of non-zero index.
      
      The changes in this patch are mechanical and don't caues any
      functional difference.  This is to prepare for multiple pools per
      gcwq.
      
      v2: nr_running indexing bug in get_pool_nr_running() fixed.
      
      v3: Pointer to array is stupid.  Don't use it in get_pool_nr_running()
          as suggested by Linus.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      4ce62e9e
  7. 13 7月, 2012 4 次提交
    • T
      workqueue: separate out worker_pool flags · 11ebea50
      Tejun Heo 提交于
      GCWQ_MANAGE_WORKERS, GCWQ_MANAGING_WORKERS and GCWQ_HIGHPRI_PENDING
      are per-pool properties.  Add worker_pool->flags and make the above
      three flags per-pool flags.
      
      The changes in this patch are mechanical and don't caues any
      functional difference.  This is to prepare for multiple pools per
      gcwq.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      11ebea50
    • T
      workqueue: use @pool instead of @gcwq or @cpu where applicable · 63d95a91
      Tejun Heo 提交于
      Modify all functions which deal with per-pool properties to pass
      around @pool instead of @gcwq or @cpu.
      
      The changes in this patch are mechanical and don't caues any
      functional difference.  This is to prepare for multiple pools per
      gcwq.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      63d95a91
    • T
      workqueue: factor out worker_pool from global_cwq · bd7bdd43
      Tejun Heo 提交于
      Move worklist and all worker management fields from global_cwq into
      the new struct worker_pool.  worker_pool points back to the containing
      gcwq.  worker and cpu_workqueue_struct are updated to point to
      worker_pool instead of gcwq too.
      
      This change is mechanical and doesn't introduce any functional
      difference other than rearranging of fields and an added level of
      indirection in some places.  This is to prepare for multiple pools per
      gcwq.
      
      v2: Comment typo fixes as suggested by Namhyung.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      bd7bdd43
    • T
      workqueue: don't use WQ_HIGHPRI for unbound workqueues · 974271c4
      Tejun Heo 提交于
      Unbound wqs aren't concurrency-managed and try to execute work items
      as soon as possible.  This is currently achieved by implicitly setting
      %WQ_HIGHPRI on all unbound workqueues; however, WQ_HIGHPRI
      implementation is about to be restructured and this usage won't be
      valid anymore.
      
      Add an explicit chain-wakeup path for unbound workqueues in
      process_one_work() instead of piggy backing on %WQ_HIGHPRI.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      974271c4
  8. 15 5月, 2012 2 次提交
    • P
      lockdep: fix oops in processing workqueue · 4d82a1de
      Peter Zijlstra 提交于
      Under memory load, on x86_64, with lockdep enabled, the workqueue's
      process_one_work() has been seen to oops in __lock_acquire(), barfing
      on a 0xffffffff00000000 pointer in the lockdep_map's class_cache[].
      
      Because it's permissible to free a work_struct from its callout function,
      the map used is an onstack copy of the map given in the work_struct: and
      that copy is made without any locking.
      
      Surprisingly, gcc (4.5.1 in Hugh's case) uses "rep movsl" rather than
      "rep movsq" for that structure copy: which might race with a workqueue
      user's wait_on_work() doing lock_map_acquire() on the source of the
      copy, putting a pointer into the class_cache[], but only in time for
      the top half of that pointer to be copied to the destination map.
      
      Boom when process_one_work() subsequently does lock_map_acquire()
      on its onstack copy of the lockdep_map.
      
      Fix this, and a similar instance in call_timer_fn(), with a
      lockdep_copy_map() function which additionally NULLs the class_cache[].
      
      Note: this oops was actually seen on 3.4-next, where flush_work() newly
      does the racing lock_map_acquire(); but Tejun points out that 3.4 and
      earlier are already vulnerable to the same through wait_on_work().
      
      * Patch orginally from Peter.  Hugh modified it a bit and wrote the
        description.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Reported-by: NHugh Dickins <hughd@google.com>
      LKML-Reference: <alpine.LSU.2.00.1205070951170.1544@eggly.anvils>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      4d82a1de
    • T
      workqueue: skip nr_running sanity check in worker_enter_idle() if trustee is active · 544ecf31
      Tejun Heo 提交于
      worker_enter_idle() has WARN_ON_ONCE() which triggers if nr_running
      isn't zero when every worker is idle.  This can trigger spuriously
      while a cpu is going down due to the way trustee sets %WORKER_ROGUE
      and zaps nr_running.
      
      It first sets %WORKER_ROGUE on all workers without updating
      nr_running, releases gcwq->lock, schedules, regrabs gcwq->lock and
      then zaps nr_running.  If the last running worker enters idle
      inbetween, it would see stale nr_running which hasn't been zapped yet
      and trigger the WARN_ON_ONCE().
      
      Fix it by performing the sanity check iff the trustee is idle.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      544ecf31
  9. 24 4月, 2012 1 次提交
    • S
      workqueue: Catch more locking problems with flush_work() · 0976dfc1
      Stephen Boyd 提交于
      If a workqueue is flushed with flush_work() lockdep checking can
      be circumvented. For example:
      
       static DEFINE_MUTEX(mutex);
      
       static void my_work(struct work_struct *w)
       {
               mutex_lock(&mutex);
               mutex_unlock(&mutex);
       }
      
       static DECLARE_WORK(work, my_work);
      
       static int __init start_test_module(void)
       {
               schedule_work(&work);
               return 0;
       }
       module_init(start_test_module);
      
       static void __exit stop_test_module(void)
       {
               mutex_lock(&mutex);
               flush_work(&work);
               mutex_unlock(&mutex);
       }
       module_exit(stop_test_module);
      
      would not always print a warning when flush_work() was called.
      In this trivial example nothing could go wrong since we are
      guaranteed module_init() and module_exit() don't run concurrently,
      but if the work item is schedule asynchronously we could have a
      scenario where the work item is running just at the time flush_work()
      is called resulting in a classic ABBA locking problem.
      
      Add a lockdep hint by acquiring and releasing the work item
      lockdep_map in flush_work() so that we always catch this
      potential deadlock scenario.
      Signed-off-by: NStephen Boyd <sboyd@codeaurora.org>
      Reviewed-by: NYong Zhang <yong.zhang0@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      0976dfc1
  10. 17 4月, 2012 1 次提交
  11. 13 3月, 2012 1 次提交
  12. 02 3月, 2012 1 次提交
    • A
      Block: use a freezable workqueue for disk-event polling · 62d3c543
      Alan Stern 提交于
      This patch (as1519) fixes a bug in the block layer's disk-events
      polling.  The polling is done by a work routine queued on the
      system_nrt_wq workqueue.  Since that workqueue isn't freezable, the
      polling continues even in the middle of a system sleep transition.
      
      Obviously, polling a suspended drive for media changes and such isn't
      a good thing to do; in the case of USB mass-storage devices it can
      lead to real problems requiring device resets and even re-enumeration.
      
      The patch fixes things by creating a new system-wide, non-reentrant,
      freezable workqueue and using it for disk-events polling.
      Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
      CC: <stable@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      62d3c543
  13. 11 1月, 2012 1 次提交
  14. 31 10月, 2011 1 次提交
  15. 15 9月, 2011 1 次提交
  16. 20 5月, 2011 1 次提交
    • T
      workqueue: separate out drain_workqueue() from destroy_workqueue() · 9c5a2ba7
      Tejun Heo 提交于
      There are users which want to drain workqueues without destroying it.
      Separate out drain functionality from destroy_workqueue() into
      drain_workqueue() and make it accessible to workqueue users.
      
      To guarantee forward-progress, only chain queueing is allowed while
      drain is in progress.  If a new work item which isn't chained from the
      running or pending work items is queued while draining is in progress,
      WARN_ON_ONCE() is triggered.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      9c5a2ba7
  17. 30 4月, 2011 1 次提交
  18. 31 3月, 2011 1 次提交
  19. 25 3月, 2011 1 次提交
    • T
      percpu: Always align percpu output section to PAGE_SIZE · 0415b00d
      Tejun Heo 提交于
      Percpu allocator honors alignment request upto PAGE_SIZE and both the
      percpu addresses in the percpu address space and the translated kernel
      addresses should be aligned accordingly.  The calculation of the
      former depends on the alignment of percpu output section in the kernel
      image.
      
      The linker script macros PERCPU_VADDR() and PERCPU() are used to
      define this output section and the latter takes @align parameter.
      Several architectures are using @align smaller than PAGE_SIZE breaking
      percpu memory alignment.
      
      This patch removes @align parameter from PERCPU(), renames it to
      PERCPU_SECTION() and makes it always align to PAGE_SIZE.  While at it,
      add PCPU_SETUP_BUG_ON() checks such that alignment problems are
      reliably detected and remove percpu alignment comment recently added
      in workqueue.c as the condition would trigger BUG way before reaching
      there.
      
      For um, this patch raises the alignment of percpu area.  As the area
      is in .init, there shouldn't be any noticeable difference.
      
      This problem was discovered by David Howells while debugging boot
      failure on mn10300.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      Cc: uclinux-dist-devel@blackfin.uclinux.org
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      0415b00d
  20. 23 3月, 2011 1 次提交
  21. 08 3月, 2011 1 次提交
    • S
      debugobjects: Add hint for better object identification · 99777288
      Stanislaw Gruszka 提交于
      In complex subsystems like mac80211 structures can contain several
      timers and work structs, so identifying a specific instance from the
      call trace and object type output of debugobjects can be hard.
      
      Allow the subsystems which support debugobjects to provide a hint
      function. This function returns a pointer to a kernel address
      (preferrably the objects callback function) which is printed along
      with the debugobjects type.
      
      Add hint methods for timer_list, work_struct and hrtimer.
      
      [ tglx: Massaged changelog, made it compile ]
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <20110307085809.GA9334@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      99777288
  22. 21 2月, 2011 1 次提交
  23. 17 2月, 2011 2 次提交
  24. 14 2月, 2011 1 次提交