block: Mitigate lock unbalance caused by lock switching (5e5cfac0) · 提交 · openanolis / cloud-kernel

提交 5e5cfac0 编写于 5月 24, 2012 作者: A Asias He 提交者： Jens Axboe 6月 15, 2012

block: Mitigate lock unbalance caused by lock switching

Commit 777eb1bf disconnects externally
supplied queue_lock before blk_drain_queue(). Switching the lock would
introduce lock unbalance because theads which have taken the external
lock might unlock the internal lock in the during the queue drain. This
patch mitigate this by disconnecting the lock after the queue draining
since queue draining makes a lot of request_queue users go away.

However, please note, this patch only makes the problem less likely to
happen. Anyone who still holds a ref might try to issue a new request on
a dead queue after the blk_cleanup_queue() finishes draining, the lock
unbalance might still happen in this case.

 =====================================
 [ BUG: bad unlock balance detected! ]
 3.4.0+ #288 Not tainted
 -------------------------------------
 fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
 [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
 but there are no more locks to release!

 other info that might help us debug this:
 1 lock held by fio/17706:
  #0:  (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
 get_request_wait+0x19a/0x250

 stack backtrace:
 Pid: 17706, comm: fio Not tainted 3.4.0+ #288
 Call Trace:
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
  [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
  [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
  [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
  [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
  [<ffffffff810e0079>] lock_release+0xd9/0x250
  [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
  [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
  [<ffffffff81328faa>] generic_make_request+0xca/0x100
  [<ffffffff81329056>] submit_bio+0x76/0xf0
  [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
  [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
  [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
  [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
  [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
  [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
  [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
  [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
  [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
  [<ffffffff811e8250>] ? aio_fsync+0x30/0x30
  [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
  [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
  [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
  [<ffffffff811eae50>] sys_io_submit+0x10/0x20
  [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b

Changes since v2: Update commit log to explain how the code is still
                  broken even if we delay the lock switching after the drain.
Changes since v1: Update commit log as Tejun suggested.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAsias He <asias@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

上级 458f27a9

隐藏空白更改

内联并排

浏览文件 @ 5e5cfac0

...	@@ -480,7 +480,6 @@ void blk_cleanup_queue(struct request_queue *q)	...	@@ -480,7 +480,6 @@ void blk_cleanup_queue(struct request_queue *q)
	/* mark @q DEAD, no new request or merges will be allowed afterwards */		/* mark @q DEAD, no new request or merges will be allowed afterwards */
	mutex_lock(&q->sysfs_lock);		mutex_lock(&q->sysfs_lock);
	queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);		queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);

	spin_lock_irq(lock);		spin_lock_irq(lock);

	/*		/*
...	@@ -498,10 +497,6 @@ void blk_cleanup_queue(struct request_queue *q)	...	@@ -498,10 +497,6 @@ void blk_cleanup_queue(struct request_queue *q)
	queue_flag_set(QUEUE_FLAG_NOMERGES, q);		queue_flag_set(QUEUE_FLAG_NOMERGES, q);
	queue_flag_set(QUEUE_FLAG_NOXMERGES, q);		queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
	queue_flag_set(QUEUE_FLAG_DEAD, q);		queue_flag_set(QUEUE_FLAG_DEAD, q);

	if (q->queue_lock != &q->__queue_lock)
	q->queue_lock = &q->__queue_lock;

	spin_unlock_irq(lock);		spin_unlock_irq(lock);
	mutex_unlock(&q->sysfs_lock);		mutex_unlock(&q->sysfs_lock);

...	@@ -512,6 +507,11 @@ void blk_cleanup_queue(struct request_queue *q)	...	@@ -512,6 +507,11 @@ void blk_cleanup_queue(struct request_queue *q)
	del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);		del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
	blk_sync_queue(q);		blk_sync_queue(q);

			spin_lock_irq(lock);
			if (q->queue_lock != &q->__queue_lock)
			q->queue_lock = &q->__queue_lock;
			spin_unlock_irq(lock);

	/* @q is and will stay empty, shutdown and put */		/* @q is and will stay empty, shutdown and put */
	blk_put_queue(q);		blk_put_queue(q);
	}		}
...		...

想要评论请注册或

openanolis / cloud-kernel 1 年多 前同步成功

block: Mitigate lock unbalance caused by lock switching

openanolis / cloud-kernel
1 年多前同步成功