- 19 9月, 2012 4 次提交
-
-
由 Lai Jiangshan 提交于
Now that manager_mutex's role has changed from synchronizing manager role to excluding hotplug against manager, the name is misleading. As it is protecting the CPU-association of the gcwq now, rename it to assoc_mutex. This patch is pure rename and doesn't introduce any functional change. tj: Updated comments and description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Lai Jiangshan 提交于
Now both worker destruction and idle rebinding remove the worker from idle list while it's still idle, so list_empty(&worker->entry) can be used to test whether either is pending and WORKER_DIE to distinguish between the two instead making WORKER_REBIND unnecessary. Use list_empty(&worker->entry) to determine whether destruction or rebinding is pending. This simplifies worker state transitions. WORKER_REBIND is not needed anymore. Remove it. tj: Updated comments and description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Lai Jiangshan 提交于
Because the old unbind/rebinding implementation wasn't atomic w.r.t. GCWQ_DISASSOCIATED manipulation which is protected by global_cwq->lock, we had to use two flags, WORKER_UNBOUND and WORKER_REBIND, to avoid incorrectly losing all NOT_RUNNING bits with back-to-back CPU hotplug operations; otherwise, completion of rebinding while another unbinding is in progress could clear UNBIND prematurely. Now that both unbind/rebinding are atomic w.r.t. GCWQ_DISASSOCIATED, there's no need to use two flags. Just one is enough. Don't use WORKER_REBIND for busy rebinding. tj: Updated description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Lai Jiangshan 提交于
Currently rebind_workers() uses rebinds idle workers synchronously before proceeding to requesting busy workers to rebind. This is necessary because all workers on @worker_pool->idle_list must be bound before concurrency management local wake-ups from the busy workers take place. Unfortunately, the synchronous idle rebinding is quite complicated. This patch reimplements idle rebinding to simplify the code path. Rather than trying to make all idle workers bound before rebinding busy workers, we simply remove all to-be-bound idle workers from the idle list and let them add themselves back after completing rebinding (successful or not). As only workers which finished rebinding can on on the idle worker list, the idle worker list is guaranteed to have only bound workers unless CPU went down again and local wake-ups are safe. After the change, @worker_pool->nr_idle may deviate than the actual number of idle workers on @worker_pool->idle_list. More specifically, nr_idle may be non-zero while ->idle_list is empty. All users of ->nr_idle and ->idle_list are audited. The only affected one is too_many_workers() which is updated to check %false if ->idle_list is empty regardless of ->nr_idle. After this patch, rebind_workers() no longer performs the nasty idle-rebind retries which require temporary release of gcwq->lock, and both unbinding and rebinding are atomic w.r.t. global_cwq->lock. worker->idle_rebind and global_cwq->rebind_hold are now unnecessary and removed along with the definition of struct idle_rebind. Changed from V1: 1) remove unlikely from too_many_workers(), ->idle_list can be empty anytime, even before this patch, no reason to use unlikely. 2) fix a small rebasing mistake. (which is from rebasing the orignal fixing patch to for-next) 3) add a lot of comments. 4) clear WORKER_REBIND unconditionaly in idle_worker_rebind() tj: Updated comments and description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 18 9月, 2012 1 次提交
-
-
由 Lai Jiangshan 提交于
busy_worker_rebind_fn() didn't clear WORKER_REBIND if rebinding failed (CPU is down again). This used to be okay because the flag wasn't used for anything else. However, after 25511a47 "workqueue: reimplement CPU online rebinding to handle idle workers", WORKER_REBIND is also used to command idle workers to rebind. If not cleared, the worker may confuse the next CPU_UP cycle by having REBIND spuriously set or oops / get stuck by prematurely calling idle_worker_rebind(). WARNING: at /work/os/wq/kernel/workqueue.c:1323 worker_thread+0x4cd/0x5 00() Hardware name: Bochs Modules linked in: test_wq(O-) Pid: 33, comm: kworker/1:1 Tainted: G O 3.6.0-rc1-work+ #3 Call Trace: [<ffffffff8109039f>] warn_slowpath_common+0x7f/0xc0 [<ffffffff810903fa>] warn_slowpath_null+0x1a/0x20 [<ffffffff810b3f1d>] worker_thread+0x4cd/0x500 [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 ---[ end trace e977cf20f4661968 ]--- BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810b3db0>] worker_thread+0x360/0x500 PGD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: test_wq(O-) CPU 0 Pid: 33, comm: kworker/1:1 Tainted: G W O 3.6.0-rc1-work+ #3 Bochs Bochs RIP: 0010:[<ffffffff810b3db0>] [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP: 0018:ffff88001e1c9de0 EFLAGS: 00010086 RAX: 0000000000000000 RBX: ffff88001e633e00 RCX: 0000000000004140 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000009 RBP: ffff88001e1c9ea0 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000002 R11: 0000000000000000 R12: ffff88001fc8d580 R13: ffff88001fc8d590 R14: ffff88001e633e20 R15: ffff88001e1c6900 FS: 0000000000000000(0000) GS:ffff88001fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 00000000130e8000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kworker/1:1 (pid: 33, threadinfo ffff88001e1c8000, task ffff88001e1c6900) Stack: ffff880000000000 ffff88001e1c9e40 0000000000000001 ffff88001e1c8010 ffff88001e519c78 ffff88001e1c9e58 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001e1c6900 ffff88001fc8d340 ffff88001fc8d340 Call Trace: [<ffffffff810bc16e>] kthread+0xbe/0xd0 [<ffffffff81bd2664>] kernel_thread_helper+0x4/0x10 Code: b1 00 f6 43 48 02 0f 85 91 01 00 00 48 8b 43 38 48 89 df 48 8b 00 48 89 45 90 e8 ac f0 ff ff 3c 01 0f 85 60 01 00 00 48 8b 53 50 <8b> 02 83 e8 01 85 c0 89 02 0f 84 3b 01 00 00 48 8b 43 38 48 8b RIP [<ffffffff810b3db0>] worker_thread+0x360/0x500 RSP <ffff88001e1c9de0> CR2: 0000000000000000 There was no reason to keep WORKER_REBIND on failure in the first place - WORKER_UNBOUND is guaranteed to be set in such cases preventing incorrectly activating concurrency management. Always clear WORKER_REBIND. tj: Updated comment and description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 11 9月, 2012 2 次提交
-
-
由 Lai Jiangshan 提交于
To simplify both normal and CPU hotplug paths, worker management is prevented while CPU hoplug is in progress. This is achieved by CPU hotplug holding the same exclusion mechanism used by workers to ensure there's only one manager per pool. If someone else seems to be performing the manager role, workers proceed to execute work items. CPU hotplug using the same mechanism can lead to idle worker depletion because all workers could proceed to execute work items while CPU hotplug is in progress and CPU hotplug itself wouldn't actually perform the worker management duty - it doesn't guarantee that there's an idle worker left when it releases management. This idle worker depletion, under extreme circumstances, can break forward-progress guarantee and thus lead to deadlock. This patch fixes the bug by using separate mechanisms for manager exclusion among workers and hotplug exclusion. For manager exclusion, POOL_MANAGING_WORKERS which was restored by the previous patch is used. pool->manager_mutex is now only used for exclusion between the elected manager and CPU hotplug. The elected manager won't proceed without holding pool->manager_mutex. This ensures that the worker which won the manager position can't skip managing while CPU hotplug is in progress. It will block on manager_mutex and perform management after CPU hotplug is complete. Note that hotplug may happen while waiting for manager_mutex. A manager isn't either on idle or busy list and thus the hoplug code can't unbind/rebind it. Make the manager handle its own un/rebinding. tj: Updated comment and description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Lai Jiangshan 提交于
This patch restores POOL_MANAGING_WORKERS which was replaced by pool->manager_mutex by 60373152 "workqueue: use mutex for global_cwq manager exclusion". There's a subtle idle worker depletion bug across CPU hotplug events and we need to distinguish an actual manager and CPU hotplug preventing management. POOL_MANAGING_WORKERS will be used for the former and manager_mutex the later. This patch just lays POOL_MANAGING_WORKERS on top of the existing manager_mutex and doesn't introduce any synchronization changes. The next patch will update it. Note that this patch fixes a non-critical anomaly where too_many_workers() may return %true spuriously while CPU hotplug is in progress. While the issue could schedule idle timer spuriously, it didn't trigger any actual misbehavior. tj: Rewrote patch description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 06 9月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
Currently, rebind_workers() and idle_worker_rebind() are two-way interlocked. rebind_workers() waits for idle workers to finish rebinding and rebound idle workers wait for rebind_workers() to finish rebinding busy workers before proceeding. Unfortunately, this isn't enough. The second wait from idle workers is implemented as follows. wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND)); rebind_workers() clears WORKER_REBIND, wakes up the idle workers and then returns. If CPU hotplug cycle happens again before one of the idle workers finishes the above wait_event(), rebind_workers() will repeat the first part of the handshake - set WORKER_REBIND again and wait for the idle worker to finish rebinding - and this leads to deadlock because the idle worker would be waiting for WORKER_REBIND to clear. This is fixed by adding another interlocking step at the end - rebind_workers() now waits for all the idle workers to finish the above WORKER_REBIND wait before returning. This ensures that all rebinding steps are complete on all idle workers before the next hotplug cycle can happen. This problem was diagnosed by Lai Jiangshan who also posted a patch to fix the issue, upon which this patch is based. This is the minimal fix and further patches are scheduled for the next merge window to simplify the CPU hotplug path. Signed-off-by: NTejun Heo <tj@kernel.org> Original-patch-by: NLai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <1346516916-1991-3-git-send-email-laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
This doesn't make any functional difference and is purely to help the next patch to be simpler. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
-
- 05 9月, 2012 1 次提交
-
-
由 Lai Jiangshan 提交于
The compiler may compile the following code into TWO write/modify instructions. worker->flags &= ~WORKER_UNBOUND; worker->flags |= WORKER_REBIND; so the other CPU may temporarily see worker->flags which doesn't have either WORKER_UNBOUND or WORKER_REBIND set and perform local wakeup prematurely. Fix it by using single explicit assignment via ACCESS_ONCE(). Because idle workers have another WORKER_NOT_RUNNING flag, this bug doesn't exist for them; however, update it to use the same pattern for consistency. tj: Applied the change to idle workers too and updated comments and patch description a bit. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
-
- 22 8月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
cancel_delayed_work() can't be called from IRQ handlers due to its use of del_timer_sync() and can't cancel work items which are already transferred from timer to worklist. Also, unlike other flush and cancel functions, a canceled delayed_work would still point to the last associated cpu_workqueue. If the workqueue is destroyed afterwards and the work item is re-used on a different workqueue, the queueing code can oops trying to dereference already freed cpu_workqueue. This patch reimplements cancel_delayed_work() using try_to_grab_pending() and set_work_cpu_and_clear_pending(). This allows the function to be called from IRQ handlers and makes its behavior consistent with other flush / cancel functions. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>
-
由 Tejun Heo 提交于
Up to now, for delayed_works, try_to_grab_pending() couldn't be used from IRQ handlers because IRQs may happen while delayed_work_timer_fn() is in progress leading to indefinite -EAGAIN. This patch makes delayed_work use the new TIMER_IRQSAFE flag for delayed_work->timer. This makes try_to_grab_pending() and thus mod_delayed_work_on() safe to call from IRQ handlers. Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 21 8月, 2012 4 次提交
-
-
由 Tejun Heo 提交于
Now that all workqueues are non-reentrant, system[_freezable]_wq() are equivalent to system_nrt[_freezable]_wq(). Replace the latter with wrappers around system[_freezable]_wq(). The wrapping goes through inline functions so that __deprecated can be added easily. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Now that all workqueues are non-reentrant, flush[_delayed]_work_sync() are equivalent to flush[_delayed]_work(). Drop the separate implementation and make them thin wrappers around flush[_delayed]_work(). * start_flush_work() no longer takes @wait_executing as the only left user - flush_work() - always sets it to %true. * __cancel_work_timer() uses flush_work() instead of wait_on_work(). Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
By default, each per-cpu part of a bound workqueue operates separately and a work item may be executing concurrently on different CPUs. The behavior avoids some cross-cpu traffic but leads to subtle weirdities and not-so-subtle contortions in the API. * There's no sane usefulness in allowing a single work item to be executed concurrently on multiple CPUs. People just get the behavior unintentionally and get surprised after learning about it. Most either explicitly synchronize or use non-reentrant/ordered workqueue but this is error-prone. * flush_work() can't wait for multiple instances of the same work item on different CPUs. If a work item is executing on cpu0 and then queued on cpu1, flush_work() can only wait for the one on cpu1. Unfortunately, work items can easily cross CPU boundaries unintentionally when the queueing thread gets migrated. This means that if multiple queuers compete, flush_work() can't even guarantee that the instance queued right before it is finished before returning. * flush_work_sync() was added to work around some of the deficiencies of flush_work(). In addition to the usual flushing, it ensures that all currently executing instances are finished before returning. This operation is expensive as it has to walk all CPUs and at the same time fails to address competing queuer case. Incorrectly using flush_work() when flush_work_sync() is necessary is an easy error to make and can lead to bugs which are difficult to reproduce. * Similar problems exist for flush_delayed_work[_sync](). Other than the cross-cpu access concern, there's no benefit in allowing parallel execution and it's plain silly to have this level of contortion for workqueue which is widely used from core code to extremely obscure drivers. This patch makes all workqueues non-reentrant. If a work item is executing on a different CPU when queueing is requested, it is always queued to that CPU. This guarantees that any given work item can be executing on one CPU at maximum and if a work item is queued and executing, both are on the same CPU. The only behavior change which may affect workqueue users negatively is that non-reentrancy overrides the affinity specified by queue_work_on(). On a reentrant workqueue, the affinity specified by queue_work_on() is always followed. Now, if the work item is executing on one of the CPUs, the work item will be queued there regardless of the requested affinity. I've reviewed all workqueue users which request explicit affinity, and, fortunately, none seems to be crazy enough to exploit parallel execution of the same work item. This adds an additional busy_hash lookup if the work item was previously queued on a different CPU. This shouldn't be noticeable under any sane workload. Work item queueing isn't a very high-frequency operation and they don't jump across CPUs all the time. In a micro benchmark to exaggerate this difference - measuring the time it takes for two work items to repeatedly jump between two CPUs a number (10M) of times with busy_hash table densely populated, the difference was around 3%. While the overhead is measureable, it is only visible in pathological cases and the difference isn't huge. This change brings much needed sanity to workqueue and makes its behavior consistent with timer. I think this is the right tradeoff to make. This enables significant simplification of workqueue API. Simplification patches will follow. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Valentin Ilie 提交于
Fixed some checkpatch warnings. tj: adapted to wq/for-3.7 and massaged pr_xxx() format strings a bit. Signed-off-by: NValentin Ilie <valentin.ilie@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org> LKML-Reference: <1345326762-21747-1-git-send-email-valentin.ilie@gmail.com>
-
- 17 8月, 2012 6 次提交
-
-
由 Joonsoo Kim 提交于
To speed cpu down processing up, use system_highpri_wq. As scheduling priority of workers on it is higher than system_wq and it is not contended by other normal works on this cpu, work on it is processed faster than system_wq. tj: CPU up/downs care quite a bit about latency these days. This shouldn't hurt anything and makes sense. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Joonsoo Kim 提交于
In rebind_workers(), we do inserting a work to rebind to cpu for busy workers. Currently, in this case, we use only system_wq. This makes a possible error situation as there is mismatch between cwq->pool and worker->pool. To prevent this, we should use system_highpri_wq for highpri worker to match theses. This implements it. tj: Rephrased comment a bit. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Joonsoo Kim 提交于
Commit 3270476a ('workqueue: reimplement WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker or highpri worker. But, we don't consider this difference in rebind_workers(), we use just system_wq for highpri worker. It makes mismatch between cwq->pool and worker->pool. It doesn't make error in current implementation, but possible in the future. Now, we introduce system_highpri_wq to use proper cwq for highpri workers in rebind_workers(). Following patch fix this issue properly. tj: Even apart from rebinding, having system_highpri_wq generally makes sense. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Joonsoo Kim 提交于
We assign cpu id into work struct's data field in __queue_delayed_work_on(). In current implementation, when work is come in first time, current running cpu id is assigned. If we do __queue_delayed_work_on() with CPU A on CPU B, __queue_work() invoked in delayed_work_timer_fn() go into the following sub-optimal path in case of WQ_NON_REENTRANT. gcwq = get_gcwq(cpu); if (wq->flags & WQ_NON_REENTRANT && (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) { Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND. It is sufficient to prevent to go into sub-optimal path. tj: Slightly rephrased the comment. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Joonsoo Kim 提交于
When we do tracing workqueue_queue_work(), it records requested cpu. But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND, requested cpu is changed as local cpu. In case of @wq->flag & WQ_UNBOUND, above change is not occured, therefore it is reasonable to correct it. Use temporary local variable for storing requested cpu. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Joonsoo Kim 提交于
Commit 3270476a ('workqueue: reimplement WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent size of pools, definition of worker_pool in gcwq doesn't use it. Using it makes code robust and prevent future mistakes. So change code to use this enum value. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 14 8月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
Any operation which clears PENDING should be preceded by a wmb to guarantee that the next PENDING owner sees all the changes made before PENDING release. There are only two places where PENDING is cleared - set_work_cpu_and_clear_pending() and clear_work_data(). The caller of the former already does smp_wmb() but the latter doesn't have any. Move the wmb above set_work_cpu_and_clear_pending() into it and add one to clear_work_data(). There hasn't been any report related to this issue, and, given how clear_work_data() is used, it is extremely unlikely to have caused any actual problems on any architecture. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com>
-
由 Tejun Heo 提交于
delayed_work encodes the workqueue to use and the last CPU in delayed_work->work.data while it's on timer. The target CPU is implicitly recorded as the CPU the timer is queued on and delayed_work_timer_fn() queues delayed_work->work to the CPU it is running on. Unfortunately, this leaves flush_delayed_work[_sync]() no way to find out which CPU the delayed_work was queued for when they try to re-queue after killing the timer. Currently, it chooses the local CPU flush is running on. This can unexpectedly move a delayed_work queued on a specific CPU to another CPU and lead to subtle errors. There isn't much point in trying to save several bytes in struct delayed_work, which is already close to a hundred bytes on 64bit with all debug options turned off. This patch adds delayed_work->cpu to remember the CPU it's queued for. Note that if the timer is migrated during CPU down, the work item could be queued to the downed global_cwq after this change. As a detached global_cwq behaves like an unbound one, this doesn't change much for the delayed_work. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org>
-
- 04 8月, 2012 13 次提交
-
-
由 Tejun Heo 提交于
Workqueue was lacking a mechanism to modify the timeout of an already pending delayed_work. delayed_work users have been working around this using several methods - using an explicit timer + work item, messing directly with delayed_work->timer, and canceling before re-queueing, all of which are error-prone and/or ugly. This patch implements mod_delayed_work[_on]() which behaves similarly to mod_timer() - if the delayed_work is idle, it's queued with the given delay; otherwise, its timeout is modified to the new value. Zero @delay guarantees immediate execution. v2: Updated to reflect try_to_grab_pending() changes. Now safe to be called from bh context. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com>
-
由 Tejun Heo 提交于
There can be two reasons try_to_grab_pending() can fail with -EAGAIN. One is when someone else is queueing or deqeueing the work item. With the previous patches, it is guaranteed that PENDING and queued state will soon agree making it safe to busy-retry in this case. The other is if multiple __cancel_work_timer() invocations are racing one another. __cancel_work_timer() grabs PENDING and then waits for running instances of the target work item on all CPUs while holding PENDING and !queued. try_to_grab_pending() invoked from another task will keep returning -EAGAIN while the current owner is waiting. Not distinguishing the two cases is okay because __cancel_work_timer() is the only user of try_to_grab_pending() and it invokes wait_on_work() whenever grabbing fails. For the first case, busy looping should be fine but wait_on_work() doesn't cause any critical problem. For the latter case, the new contender usually waits for the same condition as the current owner, so no unnecessarily extended busy-looping happens. Combined, these make __cancel_work_timer() technically correct even without irq protection while grabbing PENDING or distinguishing the two different cases. While the current code is technically correct, not distinguishing the two cases makes it difficult to use try_to_grab_pending() for other purposes than canceling because it's impossible to tell whether it's safe to busy-retry grabbing. This patch adds a mechanism to mark a work item being canceled. try_to_grab_pending() now disables irq on success and returns -EAGAIN to indicate that grabbing failed but PENDING and queued states are gonna agree soon and it's safe to busy-loop. It returns -ENOENT if the work item is being canceled and it may stay PENDING && !queued for arbitrary amount of time. __cancel_work_timer() is modified to mark the work canceling with WORK_OFFQ_CANCELING after grabbing PENDING, thus making try_to_grab_pending() fail with -ENOENT instead of -EAGAIN. Also, it invokes wait_on_work() iff grabbing failed with -ENOENT. This isn't necessary for correctness but makes it consistent with other future users of try_to_grab_pending(). v2: try_to_grab_pending() was testing preempt_count() to ensure that the caller has disabled preemption. This triggers spuriously if !CONFIG_PREEMPT_COUNT. Use preemptible() instead. Reported by Fengguang Wu. v3: Updated so that try_to_grab_pending() disables irq on success rather than requiring preemption disabled by the caller. This makes busy-looping easier and will allow try_to_grap_pending() to be used from bh/irq contexts. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Fengguang Wu <fengguang.wu@intel.com>
-
由 Tejun Heo 提交于
* Use bool @is_dwork instead of @timer and let try_to_grab_pending() use to_delayed_work() to determine the delayed_work address. * Move timer handling from __cancel_work_timer() to try_to_grab_pending(). * Make try_to_grab_pending() use -EAGAIN instead of -1 for busy-looping and drop the ret local variable. * Add proper function comment to try_to_grab_pending(). This makes the code a bit easier to understand and will ease further changes. This patch doesn't make any functional change. v2: Use @is_dwork instead of @timer. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
This is to prepare for mod_delayed_work[_on]() and doesn't cause any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Low WORK_STRUCT_FLAG_BITS bits of work_struct->data contain WORK_STRUCT_FLAG_* and flush color. If the work item is queued, the rest point to the cpu_workqueue with WORK_STRUCT_CWQ set; otherwise, WORK_STRUCT_CWQ is clear and the bits contain the last CPU number - either a real CPU number or one of WORK_CPU_*. Scheduled addition of mod_delayed_work[_on]() requires an additional flag, which is used only while a work item is off queue. There are more than enough bits to represent off-queue CPU number on both 32 and 64bits. This patch introduces WORK_OFFQ_FLAG_* which occupy the lower part of the @work->data high bits while off queue. This patch doesn't define any actual OFFQ flag yet. Off-queue CPU number is now shifted by WORK_OFFQ_CPU_SHIFT, which adds the number of bits used by OFFQ flags to WORK_STRUCT_FLAG_SHIFT, to make room for OFFQ flags. To avoid shift width warning with large WORK_OFFQ_FLAG_BITS, ulong cast is added to WORK_STRUCT_NO_CPU and, just in case, BUILD_BUG_ON() to check that there are enough bits to accomodate off-queue CPU number is added. This patch doesn't make any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
try_to_grab_pending() will be used by to-be-implemented mod_delayed_work[_on](). Move try_to_grab_pending() and related functions above queueing functions. This patch only moves functions around. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
If @delay is zero and the dealyed_work is idle, queue_delayed_work() queues it for immediate execution; however, queue_delayed_work_on() lacks this logic and always goes through timer regardless of @delay. This patch moves 0 @delay handling logic from queue_delayed_work() to queue_delayed_work_on() so that both functions behave the same. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Queueing functions have been using different methods to determine the local CPU. * queue_work() superflously uses get/put_cpu() to acquire and hold the local CPU across queue_work_on(). * delayed_work_timer_fn() uses smp_processor_id(). * queue_delayed_work() calls queue_delayed_work_on() with -1 @cpu which is interpreted as the local CPU. * flush_delayed_work[_sync]() were using raw_smp_processor_id(). * __queue_work() interprets %WORK_CPU_UNBOUND as local CPU if the target workqueue is bound one but nobody uses this. This patch converts all functions to uniformly use %WORK_CPU_UNBOUND to indicate local CPU and use the local binding feature of __queue_work(). unlikely() is dropped from %WORK_CPU_UNBOUND handling in __queue_work(). Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
delayed_work->timer.function is currently initialized during queue_delayed_work_on(). Export delayed_work_timer_fn() and set delayed_work timer function during delayed_work initialization together with other fields. This ensures the timer function is always valid on an initialized delayed_work. This is to help mod_delayed_work() implementation. To detect delayed_work users which diddle with the internal timer, trigger WARN if timer function doesn't match on queue. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Queueing operations use WORK_STRUCT_PENDING_BIT to synchronize access to the target work item. They first try to claim the bit and proceed with queueing only after that succeeds and there's a window between PENDING being set and the actual queueing where the task can be interrupted or preempted. There's also a similar window in process_one_work() when clearing PENDING. A work item is dequeued, gcwq->lock is released and then PENDING is cleared and the worker might get interrupted or preempted between releasing gcwq->lock and clearing PENDING. cancel[_delayed]_work_sync() tries to claim or steal PENDING. The function assumes that a work item with PENDING is either queued or in the process of being [de]queued. In the latter case, it busy-loops until either the work item loses PENDING or is queued. If canceling coincides with the above described interrupts or preemptions, the canceling task will busy-loop while the queueing or executing task is preempted. This patch keeps irq disabled across claiming PENDING and actual queueing and moves PENDING clearing in process_one_work() inside gcwq->lock so that busy looping from PENDING && !queued doesn't wait for interrupted/preempted tasks. Note that, in process_one_work(), setting last CPU and clearing PENDING got merged into single operation. This removes possible long busy-loops and will allow using try_to_grab_pending() from bh and irq contexts. v2: __queue_work() was testing preempt_count() to ensure that the caller has disabled preemption. This triggers spuriously if !CONFIG_PREEMPT_COUNT. Use preemptible() instead. Reported by Fengguang Wu. v3: Disable irq instead of preemption. IRQ will be disabled while grabbing gcwq->lock later anyway and this allows using try_to_grab_pending() from bh and irq contexts. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Fengguang Wu <fengguang.wu@intel.com>
-
由 Tejun Heo 提交于
WORK_STRUCT_PENDING is used to claim ownership of a work item and process_one_work() releases it before starting execution. When someone else grabs PENDING, all pre-release updates to the work item should be visible and all updates made by the new owner should happen afterwards. Grabbing PENDING uses test_and_set_bit() and thus has a full barrier; however, clearing doesn't have a matching wmb. Given the preceding spin_unlock and use of clear_bit, I don't believe this can be a problem on an actual machine and there hasn't been any related report but it still is theretically possible for clear_pending to permeate upwards and happen before work->entry update. Add an explicit smp_wmb() before work_clear_pending(). Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: stable@vger.kernel.org
-
由 Tejun Heo 提交于
All queueing functions return 1 on success, 0 if the work item was already pending. Update them to return bool instead. This signifies better that they don't return 0 / -errno. This is cleanup and doesn't cause any functional difference. While at it, fix comment opening for schedule_work_on(). Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Currently, queue/schedule[_delayed]_work_on() are located below the counterpart without the _on postifx even though the latter is usually implemented using the former. Swap them. This is cleanup and doesn't cause any functional difference. Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 23 7月, 2012 1 次提交
-
-
由 Tejun Heo 提交于
25511a47 "workqueue: reimplement CPU online rebinding to handle idle workers" added CPU locality sanity check in process_one_work(). It triggers if a worker is executing on a different CPU without UNBOUND or REBIND set. This works for all normal workers but rescuers can trigger this spuriously when they're serving the unbound or a disassociated global_cwq - rescuers don't have either flag set and thus its gcwq->cpu can be a different value including %WORK_CPU_UNBOUND. Fix it by additionally testing %GCWQ_DISASSOCIATED. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> LKML-Refence: <20120721213656.GA7783@linux.vnet.ibm.com>
-
- 18 7月, 2012 2 次提交
-
-
由 Tejun Heo 提交于
With trustee gone, CPU hotplug code can be simplified. * gcwq_claim/release_management() now grab and release gcwq lock too respectively and gained _and_lock and _and_unlock postfixes. * All CPU hotplug logic was implemented in workqueue_cpu_callback() which was called by workqueue_cpu_up/down_callback() for the correct priority. This was because up and down paths shared a lot of logic, which is no longer true. Remove workqueue_cpu_callback() and move all hotplug logic into the two actual callbacks. This patch doesn't make any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
-
由 Tejun Heo 提交于
With the previous changes, a disassociated global_cwq now can run as an unbound one on its own - it can create workers as necessary to drain remaining works after the CPU has been brought down and manage the number of workers using the usual idle timer mechanism making trustee completely redundant except for the actual unbinding operation. This patch removes the trustee and let a disassociated global_cwq manage itself. Unbinding is moved to a work item (for CPU affinity) which is scheduled and flushed from CPU_DONW_PREPARE. This patch moves nr_running clearing outside gcwq and manager locks to simplify the code. As nr_running is unused at the point, this is safe. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
-