提交 4fcc712f 编写于 作者: K Kent Overstreet 提交者: Linus Torvalds

aio: fix io_destroy() regression by using call_rcu()

There was a regression introduced by 36f55889 ("aio: refcounting
cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
using RCU in the shutdown path, but the synchronize_rcu() was done in
the context of the io_destroy() syscall greatly increasing the time it
could block.

This patch switches it to call_rcu() and makes shutdown asynchronous
(more asynchronous than it was originally; before the refcount changes
io_destroy() would still wait on pending kiocbs).

Note that there's a global quota on the max outstanding kiocbs, and that
quota must be manipulated synchronously; otherwise io_setup() could
return -EAGAIN when there isn't quota available, and userspace won't
have any way of waiting until shutdown of the old kioctxs has finished
(besides busy looping).

So we release our quota before kioctx shutdown has finished, which
should be fine since the quota never corresponded to anything real
anyways.
Signed-off-by: NKent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Reported-by: NJens Axboe <axboe@kernel.dk>
Tested-by: NJens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
Tested-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
上级 bba00e59
...@@ -141,9 +141,6 @@ static void aio_free_ring(struct kioctx *ctx) ...@@ -141,9 +141,6 @@ static void aio_free_ring(struct kioctx *ctx)
for (i = 0; i < ctx->nr_pages; i++) for (i = 0; i < ctx->nr_pages; i++)
put_page(ctx->ring_pages[i]); put_page(ctx->ring_pages[i]);
if (ctx->mmap_size)
vm_munmap(ctx->mmap_base, ctx->mmap_size);
if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages) if (ctx->ring_pages && ctx->ring_pages != ctx->internal_pages)
kfree(ctx->ring_pages); kfree(ctx->ring_pages);
} }
...@@ -322,11 +319,6 @@ static void free_ioctx(struct kioctx *ctx) ...@@ -322,11 +319,6 @@ static void free_ioctx(struct kioctx *ctx)
aio_free_ring(ctx); aio_free_ring(ctx);
spin_lock(&aio_nr_lock);
BUG_ON(aio_nr - ctx->max_reqs > aio_nr);
aio_nr -= ctx->max_reqs;
spin_unlock(&aio_nr_lock);
pr_debug("freeing %p\n", ctx); pr_debug("freeing %p\n", ctx);
/* /*
...@@ -435,17 +427,24 @@ static void kill_ioctx(struct kioctx *ctx) ...@@ -435,17 +427,24 @@ static void kill_ioctx(struct kioctx *ctx)
{ {
if (!atomic_xchg(&ctx->dead, 1)) { if (!atomic_xchg(&ctx->dead, 1)) {
hlist_del_rcu(&ctx->list); hlist_del_rcu(&ctx->list);
/* Between hlist_del_rcu() and dropping the initial ref */
synchronize_rcu();
/* /*
* We can't punt to workqueue here because put_ioctx() -> * It'd be more correct to do this in free_ioctx(), after all
* free_ioctx() will unmap the ringbuffer, and that has to be * the outstanding kiocbs have finished - but by then io_destroy
* done in the original process's context. kill_ioctx_rcu/work() * has already returned, so io_setup() could potentially return
* exist for exit_aio(), as in that path free_ioctx() won't do * -EAGAIN with no ioctxs actually in use (as far as userspace
* the unmap. * could tell).
*/ */
kill_ioctx_work(&ctx->rcu_work); spin_lock(&aio_nr_lock);
BUG_ON(aio_nr - ctx->max_reqs > aio_nr);
aio_nr -= ctx->max_reqs;
spin_unlock(&aio_nr_lock);
if (ctx->mmap_size)
vm_munmap(ctx->mmap_base, ctx->mmap_size);
/* Between hlist_del_rcu() and dropping the initial ref */
call_rcu(&ctx->rcu_head, kill_ioctx_rcu);
} }
} }
...@@ -495,10 +494,7 @@ void exit_aio(struct mm_struct *mm) ...@@ -495,10 +494,7 @@ void exit_aio(struct mm_struct *mm)
*/ */
ctx->mmap_size = 0; ctx->mmap_size = 0;
if (!atomic_xchg(&ctx->dead, 1)) { kill_ioctx(ctx);
hlist_del_rcu(&ctx->list);
call_rcu(&ctx->rcu_head, kill_ioctx_rcu);
}
} }
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册