[PATCH] Fix lock inversion aio_kick_handler()

lockdep found a AB BC CA lock inversion in retry-based AIO: 1) The task struct's alloc_lock (A) is acquired in process context with interrupts enabled. An interrupt might arrive and call wake_up() which grabs the wait queue's q->lock (B). 2) When performing retry-based AIO the AIO core registers aio_wake_function() as the wake funtion for iocb->ki_wait. It is called with the wait queue's q->lock (B) held and then tries to add the iocb to the run list after acquiring the ctx_lock (C). 3) aio_kick_handler() holds the ctx_lock (C) while acquiring the alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning saying that we're trying to connect the irq-safe q->lock to the irq-unsafe alloc_lock via ctx_lock. This fixes the inversion by calling unuse_mm() in the AIO kick handing path after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx could set ctx->mm to NULL, so we must only access ctx->mm while we have the lock. Signed-off-by: N Zach Brown <zach.brown@oracle.com> Signed-off-by: N Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: N Benjamin LaHaise <bcrl@kvack.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: N Andrew Morton <akpm@osdl.org> Signed-off-by: N Linus Torvalds <torvalds@osdl.org>

[PATCH] Fix lock inversion aio_kick_handler()
lockdep found a AB BC CA lock inversion in retry-based AIO: 1) The task struct's alloc_lock (A) is acquired in process context with interrupts enabled. An interrupt might arrive and call wake_up() which grabs the wait queue's q->lock (B). 2) When performing retry-based AIO the AIO core registers aio_wake_function() as the wake funtion for iocb->ki_wait. It is called with the wait queue's q->lock (B) held and then tries to add the iocb to the run list after acquiring the ctx_lock (C). 3) aio_kick_handler() holds the ctx_lock (C) while acquiring the alloc_lock (A) via lock_task() and unuse_mm(). Lockdep emits a warning saying that we're trying to connect the irq-safe q->lock to the irq-unsafe alloc_lock via ctx_lock. This fixes the inversion by calling unuse_mm() in the AIO kick handing path after we've released the ctx_lock. As Ben LaHaise pointed out __put_ioctx could set ctx->mm to NULL, so we must only access ctx->mm while we have the lock. Signed-off-by: N Zach Brown <zach.brown@oracle.com> Signed-off-by: N Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: N Benjamin LaHaise <bcrl@kvack.org> Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com> Signed-off-by: N Andrew Morton <akpm@osdl.org> Signed-off-by: N Linus Torvalds <torvalds@osdl.org>
1ebb1101 · Zach Brown · Linus Torvalds · 43cdff92 · 1ebb1101
隐藏空白更改
内联并排

Showing with 3 addition and 4 deletion

fs/aio.c fs/aio.c +3 -4

未找到文件。
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -599,9 +599,6 @@ static void use_mm(struct mm_struct *mm)
 *	by the calling kernel thread
 *	(Note: this routine is intended to be called only
 *	from a kernel thread context)
- *
- * Comments: Called with ctx->ctx_lock held. This nests
- * task_lock instead ctx_lock.
 */
 static void unuse_mm(struct mm_struct *mm)
 {
@@ -850,14 +847,16 @@ static void aio_kick_handler(struct work_struct *work)
 {
 	struct kioctx *ctx = container_of(work, struct kioctx, wq.work);
 	mm_segment_t oldfs = get_fs();
+	struct mm_struct *mm;
 	int requeue;

 	set_fs(USER_DS);
 	use_mm(ctx->mm);
 	spin_lock_irq(&ctx->ctx_lock);
 	requeue =__aio_run_iocbs(ctx);
- 	unuse_mm(ctx->mm);
+	mm = ctx->mm;
 	spin_unlock_irq(&ctx->ctx_lock);
+ 	unuse_mm(mm);
 	set_fs(oldfs);
 	/*
 	 * we're in a worker thread already, don't use queue_delayed_work,