提交 · 43eaaaef0e18817bf78d8f135993f8579cad2cc6 · openeuler / qemu

23 7月, 2019 1 次提交

block: Only the main loop can change AioContexts · 43eaaaef

由 Max Reitz 提交于 7月 22, 2019

bdrv_set_aio_context_ignore() can only work in the main loop:
bdrv_drained_begin() only works in the main loop and the node's (old)
AioContext; and bdrv_drained_end() really only works in the main loop
and the node's (new) AioContext (contrary to its current comment, which
is just wrong).

Consequentially, bdrv_set_aio_context_ignore() must be called from the
main loop.  Luckily, assuming that we can make block graph changes only
from the main loop as well, all its callers do that already.

Note that changing a node's context in a sense is an operation that
changes the block graph, so it actually makes sense to require this
function to be called from the main loop.

Also, fix bdrv_drained_end()'s description.  You can only use it from
the main loop or the node's AioContext, and in the latter case, the
whole subtree must be in the same context.

Fixes: e037c09cSigned-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20190722133054.21781-3-mreitz@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

43eaaaef

19 7月, 2019 2 次提交

block: Do not poll in bdrv_do_drained_end() · e037c09c

由 Max Reitz 提交于 7月 19, 2019

We should never poll anywhere in bdrv_do_drained_end() (including its
recursive callees like bdrv_drain_invoke()), because it does not cope
well with graph changes.  In fact, it has been written based on the
postulation that no graph changes will happen in it.

Instead, the callers that want to poll must poll, i.e. all currently
globally available wrappers: bdrv_drained_end(),
bdrv_subtree_drained_end(), bdrv_unapply_subtree_drain(), and
bdrv_drain_all_end().  Graph changes there do not matter.

They can poll simply by passing a pointer to a drained_end_counter and
wait until it reaches 0.

This patch also adds a non-polling global wrapper for
bdrv_do_drained_end() that takes a drained_end_counter pointer.  We need
such a variant because now no function called anywhere from
bdrv_do_drained_end() must poll.  This includes
BdrvChildRole.drained_end(), which already must not poll according to
its interface documentation, but bdrv_child_cb_drained_end() just
violates that by invoking bdrv_drained_end() (which does poll).
Therefore, BdrvChildRole.drained_end() must take a *drained_end_counter
parameter, which bdrv_child_cb_drained_end() can pass on to the new
bdrv_drained_end_no_poll() function.

Note that we now have a pattern of all drained_end-related functions
either polling or receiving a *drained_end_counter to let the caller
poll based on that.

A problem with a single poll loop is that when the drained section in
bdrv_set_aio_context_ignore() ends, some nodes in the subgraph may be in
the old contexts, while others are in the new context already.  To let
the collective poll in bdrv_drained_end() work correctly, we must not
hold a lock to the old context, so that the old context can make
progress in case it is different from the current context.

(In the process, remove the comment saying that the current context is
always the old context, because it is wrong.)

In all other places, all nodes in a subtree must be in the same context,
so we can just poll that.  The exception of course is
bdrv_drain_all_end(), but that always runs in the main context, so we
can just poll NULL (like bdrv_drain_all_begin() does).
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

e037c09c

block: Introduce BdrvChild.parent_quiesce_counter · 804db8ea

由 Max Reitz 提交于 7月 19, 2019

Commit 5cb2737e laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

804db8ea

15 7月, 2019 2 次提交

block: Deep-clear inherits_from · 3cf746b3

由 Max Reitz 提交于 7月 03, 2019

BDS.inherits_from does not always point to an immediate parent node.
When launching a block job with a filter node, for example, the node
directly below the filter will not point to the filter, but keep its old
pointee (above the filter).

If that pointee goes away while the job is still running, the node's
inherits_from will not be updated and thus point to garbage.  To fix
this, bdrv_unref_child() has to check not only the parent node's
immediate children for nodes whose inherits_from needs to be cleared,
but its whole subtree.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20190703172813.6868-7-mreitz@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

3cf746b3

block: Add BDS.never_freeze · e5182c1c

由 Max Reitz 提交于 7月 03, 2019

The commit and the mirror block job must be able to drop their filter
node at any point.  However, this will not be possible if any of the
BdrvChild links to them is frozen.  Therefore, we need to prevent them
from ever becoming frozen.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAndrey Shinkevich <andrey.shinkevich@virtuozzo.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Message-id: 20190703172813.6868-2-mreitz@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

e5182c1c

18 6月, 2019 5 次提交

block: Ignore loosening perm restrictions failures · 1046779e

由 Max Reitz 提交于 5月 22, 2019

We generally assume that loosening permission restrictions can never
fail.  We have seen in the past that this assumption is wrong.  This has
led to crashes because we generally pass &error_abort when loosening
permissions.

However, a failure in such a case should actually be handled in quite
the opposite way: It is very much not fatal, so qemu may report it, but
still consider the operation successful.  The only realistic problem is
that qemu may then retain permissions and thus locks on images it
actually does not require.  But again, that is not fatal.

To implement this behavior, we make all functions that change
permissions and that pass &error_abort to the initiating function
(bdrv_check_perm() or bdrv_child_check_perm()) evaluate the
@loosen_restrictions value introduced in the previous patch.  If it is
true and an error did occur, we abort the permission update, discard the
error, and instead report success to the caller.

bdrv_child_try_set_perm() itself does not pass &error_abort, but it is
the only public function to change permissions.  As such, callers may
pass &error_abort to it, expecting dropping permission restrictions to
never fail.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

1046779e

block: Add *tighten_restrictions to *check*_perm() · 9eab1544

由 Max Reitz 提交于 5月 22, 2019

This patch makes three functions report whether the necessary permission
change tightens restrictions or not.  These functions are:
- bdrv_check_perm()
- bdrv_check_update_perm()
- bdrv_child_check_perm()

Callers can use this result to decide whether a failure is fatal or not
(see the next patch).
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

9eab1544

block: Fix order in bdrv_replace_child() · 87ace5f8

由 Max Reitz 提交于 5月 22, 2019

We have to start by applying the permission restrictions to new_bs
before we can loosen them on old_bs.  See the comment for the
explanation.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

87ace5f8

block: Add bdrv_child_refresh_perms() · c1087f12

由 Max Reitz 提交于 5月 22, 2019

If a block node uses bdrv_child_try_set_perm() to change the permission
it takes on its child, the result may be very short-lived.  If anything
makes the block layer recalculate the permissions internally, it will
invoke the node driver's .bdrv_child_perm() implementation.  The
permission/shared permissions masks that returns will then override the
values previously passed to bdrv_child_try_set_perm().

If drivers want a child edge to have specific values for the
permissions/shared permissions mask, it must return them in
.bdrv_child_perm().  Consequentially, there is no need for them to pass
the same values to bdrv_child_try_set_perm() then: It is better to have
a function that invokes .bdrv_child_perm() and calls
bdrv_child_try_set_perm() with the result.  This patch adds such a
function under the name of bdrv_child_refresh_perms().
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c1087f12

block: drop bs->job · b23c580c

由 Vladimir Sementsov-Ogievskiy 提交于 6月 06, 2019

Drop remaining users of bs->job:
1. assertions actually duplicated by assert(!bs->refcnt)
2. trace-point seems not enough reason to change stream_start to return
   BlockJob pointer
3. Restricting creation of two jobs based on same bs is bad idea, as
   3.1 Some jobs creates filters to be their main node, so, this check
   don't actually prevent creating second job on same real node (which
   will create another filter node) (but I hope it is restricted by
   other mechanisms)
   3.2 Even without bs->job we have two systems of permissions:
   op-blockers and BLK_PERM
   3.3 We may want to run several jobs on one node one day

And finally, drop bs->job pointer itself. Hurrah!
Suggested-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b23c580c

04 6月, 2019 6 次提交

block: Remove bdrv_set_aio_context() · 42a65f02

由 Kevin Wolf 提交于 5月 07, 2019

All callers of bdrv_set_aio_context() are eliminated now, they have
moved to bdrv_try_set_aio_context() and related safe functions. Remove
bdrv_set_aio_context().

With this, we can now know that the .set_aio_ctx callback must be
present in bdrv_set_aio_context_ignore() because
bdrv_can_set_aio_context() would have returned false previously, so
instead of checking the condition, we can assert it.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

42a65f02

block: Remove wrong bdrv_set_aio_context() calls · d0ee0204

由 Kevin Wolf 提交于 5月 07, 2019

The mirror and commit block jobs use bdrv_set_aio_context() to move
their filter node into the right AioContext before hooking it up in the
graph. Similarly, bdrv_open_backing_file() explicitly moves the backing
file node into the right AioContext first.

This isn't necessary any more, they get automatically moved into the
right context now when attaching them.

However, in the case of bdrv_open_backing_file() with a node reference,
it's actually not only unnecessary, but even wrong: The unchecked
bdrv_set_aio_context() changes the AioContext of the child node even if
other parents require it to retain the old context. So this is not only
a simplification, but a bug fix, too.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1684342Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d0ee0204

block: Move node without parents to main AioContext · ad943dcb

由 Kevin Wolf 提交于 4月 24, 2019

A node should only be in a non-default AioContext if a user is attached
to it that requires this. When the last parent of a node is gone, it can
move back to the main AioContext.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ad943dcb

block: Adjust AioContexts when attaching nodes · 132ada80

由 Kevin Wolf 提交于 4月 24, 2019

So far, we only made sure that updating the AioContext of a node
affected the whole subtree. However, if a node is newly attached to a
new parent, we also need to make sure that both the subtree of the node
and the parent are in the same AioContext. This tries to move the new
child node to the parent AioContext and returns an error if this isn't
possible.

BlockBackends now actually apply their AioContext to their root node.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

132ada80

block: Add BlockBackend.ctx · d861ab3a

由 Kevin Wolf 提交于 4月 25, 2019

This adds a new parameter to blk_new() which requires its callers to
declare from which AioContext this BlockBackend is going to be used (or
the locks of which AioContext need to be taken anyway).

The given context is only stored and kept up to date when changing
AioContexts. Actually applying the stored AioContext to the root node
is saved for another commit.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d861ab3a

block: Drain source node in bdrv_replace_node() · f871abd6

由 Kevin Wolf 提交于 5月 21, 2019

Instead of just asserting that no requests are in flight in
bdrv_replace_node(), which is a requirement that most callers ignore, we
can just drain the source node right there. This fixes at least starting
a commit job while I/O is active on the backing chain, but probably
other callers, too.

Having requests in flight on the target node isn't a problem because the
target just gets new parents, but the call path of running requests
isn't modified. So we can just drop this assertion without a replacement.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1711643Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

f871abd6

29 5月, 2019 2 次提交

block: Make bdrv_root_attach_child() unref child_bs on failure · b441dc71

由 Alberto Garcia 提交于 5月 13, 2019

A consequence of the previous patch is that bdrv_attach_child()
transfers the reference to child_bs from the caller to parent_bs,
which will drop it on bdrv_close() or when someone calls
bdrv_unref_child().

But this only happens when bdrv_attach_child() succeeds. If it fails
then the caller is responsible for dropping the reference to child_bs.

This patch makes bdrv_attach_child() take the reference also when
there is an error, freeing the caller for having to do it.

A similar situation happens with bdrv_root_attach_child(), so the
changes on this patch affect both functions.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Message-id: 20dfb3d9ccec559cdd1a9690146abad5d204a186.1557754872.git.berto@igalia.com
[mreitz: Removed now superfluous BdrvChild * variable in
         bdrv_open_child()]
Signed-off-by: NMax Reitz <mreitz@redhat.com>

b441dc71

block: Use bdrv_unref_child() for all children in bdrv_close() · dd4118c7

由 Alberto Garcia 提交于 5月 13, 2019

bdrv_unref_child() does the following things:

  - Updates the child->bs->inherits_from pointer.
  - Calls bdrv_detach_child() to remove the BdrvChild from bs->children.
  - Calls bdrv_unref() to unref the child BlockDriverState.

When bdrv_unref_child() was introduced in commit 33a60407 it was not
used in bdrv_close() because the drivers that had additional children
(like quorum or blkverify) had already called bdrv_unref() on their
children during their own close functions.

This was changed later (in 0bd6e91a for quorum, in 3e586be0 for
blkverify) so there's no reason not to use bdrv_unref_child() in
bdrv_close() anymore.

After this there's also no need to remove bs->backing and bs->file
separately from the rest of the children, so bdrv_close() can be
simplified.

Now bdrv_close() unrefs all children (before this patch it was only
bs->file and bs->backing). As a result, none of the callers of
brvd_attach_child() should remove their reference to child_bs (because
this function effectively steals that reference). This patch updates a
couple of tests that were doing their own bdrv_unref().
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Message-id: 6d1d5feaa53aa1ab127adb73d605dc4503e3abd5.1557754872.git.berto@igalia.com
[mreitz: s/where/were/]
Signed-off-by: NMax Reitz <mreitz@redhat.com>

dd4118c7

20 5月, 2019 5 次提交

block: Improve "Block node is read-only" message · 481e0eee

由 Max Reitz 提交于 5月 15, 2019

This message does not make any sense when it appears as the response to
making an R/W node read-only.  We should detect that case and emit a
different message, then.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

481e0eee

block: Propagate AioContext change to parents · 53a7d041

由 Kevin Wolf 提交于 5月 06, 2019

All block nodes and users in any connected component of the block graph
must be in the same AioContext, so changing the AioContext of one node
must not only change all of its children, but all of its parents (and
in turn their children etc.) as well.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

53a7d041

block: Move recursion to bdrv_set_aio_context() · 0d83708a

由 Kevin Wolf 提交于 5月 06, 2019

Instead of having two recursions, in bdrv_attach_aio_context() and in
bdrv_detach_aio_context(), just having one recursion is enough. Said
functions are only about a single node now.

It is important that the recursion doesn't happen between detaching and
attaching a context to the current node because the nested call will
drain the node, and draining with a NULL context would segfault.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0d83708a

block: Make bdrv_attach/detach_aio_context() static · a3a683c3

由 Kevin Wolf 提交于 5月 06, 2019

Since commit b97511c7, there is no reason for block drivers any more
to call these functions (see the function comment in block_int.h). They
are now just internal helper functions for bdrv_set_aio_context()
and can be made static.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

a3a683c3

block: Add bdrv_try_set_aio_context() · 5d231849

由 Kevin Wolf 提交于 5月 06, 2019

Eventually, we want to make sure that all parents and all children of a
node are in the same AioContext as the node itself. This means that
changing the AioContext may fail because one of the other involved
parties (e.g. a guest device that was configured with an iothread)
cannot allow switching to a different AioContext.

Introduce a set of functions that allow to first check whether all
involved nodes can switch to a new context and only then do the actual
switch. The check recursively covers children and parents.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5d231849

10 5月, 2019 2 次提交

block: remove bs from lists before closing · 30c321f9

由 Anton Kuchin 提交于 5月 07, 2019

Close involves flush that can be performed asynchronously and bs
must be protected from being referenced before it is deleted.
Signed-off-by: NAnton Kuchin <antonkuchin@yandex-team.ru>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

30c321f9

block: Add coroutine_fn to bdrv_check_co_entry · 66a5bdf3

由 Nikita Alekseev 提交于 4月 01, 2019

bdrv_check_co_entry calls bdrv_co_check, which is a coroutine function.
Thus, it also needs to be marked as a coroutine.
Signed-off-by: NNikita Alekseev <n.alekseev2104@gmail.com>
Message-id: 20190401093051.16488-1-n.alekseev2104@gmail.com
Message-Id: <20190401093051.16488-1-n.alekseev2104@gmail.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

66a5bdf3

07 5月, 2019 1 次提交

block: Assert that drv->bdrv_child_perm is set in bdrv_child_perm() · 0b3ca76e

由 Alberto Garcia 提交于 4月 04, 2019

There is no need to check for this because all block drivers that have
children implement bdrv_child_perm and all callers already ensure that
bs->drv is set.

Furthermore, if this check would fail then the callers would end up
with uninitialized values for nperm and nshared.

This patch replaces the check with an assertion.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Message-id: 20190404112953.4058-1-berto@igalia.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

0b3ca76e

30 4月, 2019 1 次提交

block: Fix AioContext switch for bs->drv == NULL · 1bffe1ae

由 Kevin Wolf 提交于 4月 17, 2019

Even for block nodes with bs->drv == NULL, we can't just ignore a
bdrv_set_aio_context() call. Leaving the node in its old context can
mean that it's still in an iothread context in bdrv_close_all() during
shutdown, resulting in an attempted unlock of the AioContext lock which
we don't hold.

This is an example stack trace of a related crash:

 #0  0x00007ffff59da57f in raise () at /lib64/libc.so.6
 #1  0x00007ffff59c4895 in abort () at /lib64/libc.so.6
 #2  0x0000555555b97b1e in error_exit (err=<optimized out>, msg=msg@entry=0x555555d386d0 <__func__.19059> "qemu_mutex_unlock_impl") at util/qemu-thread-posix.c:36
 #3  0x0000555555b97f7f in qemu_mutex_unlock_impl (mutex=mutex@entry=0x5555568002f0, file=file@entry=0x555555d378df "util/async.c", line=line@entry=507) at util/qemu-thread-posix.c:97
 #4  0x0000555555b92f55 in aio_context_release (ctx=ctx@entry=0x555556800290) at util/async.c:507
 #5  0x0000555555b05cf8 in bdrv_prwv_co (child=child@entry=0x7fffc80012f0, offset=offset@entry=131072, qiov=qiov@entry=0x7fffffffd4f0, is_write=is_write@entry=true, flags=flags@entry=0)
         at block/io.c:833
 #6  0x0000555555b060a9 in bdrv_pwritev (qiov=0x7fffffffd4f0, offset=131072, child=0x7fffc80012f0) at block/io.c:990
 #7  0x0000555555b060a9 in bdrv_pwrite (child=0x7fffc80012f0, offset=131072, buf=<optimized out>, bytes=<optimized out>) at block/io.c:990
 #8  0x0000555555ae172b in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568cc740, i=i@entry=0) at block/qcow2-cache.c:51
 #9  0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568cc740) at block/qcow2-cache.c:248
 #10 0x0000555555ae15de in qcow2_cache_flush (bs=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #11 0x0000555555ae16b1 in qcow2_cache_flush_dependency (c=0x5555568a1700, c=0x5555568a1700, bs=0x555556810680) at block/qcow2-cache.c:194
 #12 0x0000555555ae16b1 in qcow2_cache_entry_flush (bs=bs@entry=0x555556810680, c=c@entry=0x5555568a1700, i=i@entry=0) at block/qcow2-cache.c:194
 #13 0x0000555555ae18dd in qcow2_cache_write (bs=bs@entry=0x555556810680, c=0x5555568a1700) at block/qcow2-cache.c:248
 #14 0x0000555555ae15de in qcow2_cache_flush (bs=bs@entry=0x555556810680, c=<optimized out>) at block/qcow2-cache.c:259
 #15 0x0000555555ad242c in qcow2_inactivate (bs=bs@entry=0x555556810680) at block/qcow2.c:2124
 #16 0x0000555555ad2590 in qcow2_close (bs=0x555556810680) at block/qcow2.c:2153
 #17 0x0000555555ab0c62 in bdrv_close (bs=0x555556810680) at block.c:3358
 #18 0x0000555555ab0c62 in bdrv_delete (bs=0x555556810680) at block.c:3542
 #19 0x0000555555ab0c62 in bdrv_unref (bs=0x555556810680) at block.c:4598
 #20 0x0000555555af4d72 in blk_remove_bs (blk=blk@entry=0x5555568103d0) at block/block-backend.c:785
 #21 0x0000555555af4dbb in blk_remove_all_bs () at block/block-backend.c:483
 #22 0x0000555555aae02f in bdrv_close_all () at block.c:3412
 #23 0x00005555557f9796 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4776

The reproducer I used is a qcow2 image on gluster volume, where the
virtual disk size (4 GB) is larger than the gluster volume size (64M),
so we can easily trigger an ENOSPC. This backend is assigned to a
virtio-blk device using an iothread, and then from the guest a
'dd if=/dev/zero of=/dev/vda bs=1G count=1' causes the VM to stop
because of an I/O error. qemu_gluster_co_flush_to_disk() sets
bs->drv = NULL on error, so when virtio-blk stops the dataplane, the
block nodes stay in the iothread AioContext. A 'quit' monitor command
issued from this paused state crashes the process.

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1631227
Cc: qemu-stable@nongnu.org
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>

1bffe1ae

08 4月, 2019 1 次提交

block: Forward 'discard' to temporary overlay · 3f48686f

由 Kevin Wolf 提交于 4月 04, 2019

When bdrv_temp_snapshot_options() is called for snapshot=on, the
'discard' option in the options QDict hasn't been parsed and merged into
the flags yet. So copy the dict entry to make sure that the temporary
overlay enables discard when it was requested for the drive.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefano Garzarella <sgarzare@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>

3f48686f

02 4月, 2019 1 次提交

block: continue until base is found in bdrv_freeze_backing_chain() et al · 0f0998f6

由 Alberto Garcia 提交于 3月 28, 2019

All three functions that handle the BdrvChild.frozen attribute walk
the backing chain from 'bs' to 'base' and stop either when 'base' is
found or at the end of the chain if 'base' is NULL.

However if 'base' is not found then the functions return without
errors as if it was NULL.

This is wrong: if the caller passed an incorrect parameter that means
that there is a bug in the code.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0f0998f6

19 3月, 2019 1 次提交

block: Silence Coverity in bdrv_drop_intermediate() · 2345bde6

由 Kevin Wolf 提交于 3月 15, 2019

Coverity doesn't like that the return value of bdrv_check_update_perm()
stays unused only in this place (CID 1399710).

Even if checking local_err should be equivalent to checking ret < 0,
let's switch to using the return value to be more consistent (and in
case of a bug somewhere down the call chain, forgetting to assign errp
is more likely than returning 0 for an error case).
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>

2345bde6

13 3月, 2019 10 次提交

block: Remove the AioContext parameter from bdrv_reopen_multiple() · 5019aece

由 Alberto Garcia 提交于 3月 12, 2019

This parameter has been unused since 1a63a907Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5019aece

block: Add bdrv_reset_options_allowed() · faf116b4

由 Alberto Garcia 提交于 3月 12, 2019

bdrv_reopen_prepare() receives a BDRVReopenState with (among other
things) a new set of options to be applied to that BlockDriverState.

If an option is missing then it means that we want to reset it to its
default value rather than keep the previous one. This way the state
of the block device after being reopened is comparable to that of a
device added with "blockdev-add" using the same set of options.

Not all options from all drivers can be changed this way, however.
If the user attempts to reset an immutable option to its default value
using this method then we must forbid it.

This new function takes a BlockDriverState and a new set of options
and checks if there's any option that was previously set but is
missing from the new set of options.

If the option is present in both sets we don't need to check that they
have the same value. The loop at the end of bdrv_reopen_prepare()
already takes care of that.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

faf116b4

block: Allow changing the backing file on reopen · cb828c31

由 Alberto Garcia 提交于 3月 12, 2019

This patch allows the user to change the backing file of an image that
is being reopened. Here's what it does:

 - In bdrv_reopen_prepare(): check that the value of 'backing' points
   to an existing node or is null. If it points to an existing node it
   also needs to make sure that replacing the backing file will not
   create a cycle in the node graph (i.e. you cannot reach the parent
   from the new backing file).

 - In bdrv_reopen_commit(): perform the actual node replacement by
   calling bdrv_set_backing_hd().

There may be temporary implicit nodes between a BDS and its backing
file (e.g. a commit filter node). In these cases bdrv_reopen_prepare()
looks for the real (non-implicit) backing file and requires that the
'backing' option points to it. Replacing or detaching a backing file
is forbidden if there are implicit nodes in the middle.

Although x-blockdev-reopen is meant to be used like blockdev-add,
there's an important thing that must be taken into account: the only
way to set a new backing file is by using a reference to an existing
node (previously added with e.g. blockdev-add).  If 'backing' contains
a dictionary with a new set of options ({"driver": "qcow2", "file": {
... }}) then it is interpreted that the _existing_ backing file must
be reopened with those options.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cb828c31

block: Allow omitting the 'backing' option in certain cases · bacd9b87

由 Alberto Garcia 提交于 3月 12, 2019

Of all options of type BlockdevRef used to specify children in
BlockdevOptions, 'backing' is the only one that is optional.

For "x-blockdev-reopen" we want that if an option is omitted then it
must be reset to its default value. The default value of 'backing'
means that QEMU opens the backing file specified in the image
metadata, but this is not something that we want to support for the
reopen operation.

Because of this the 'backing' option has to be specified during
reopen, pointing to the existing backing file if we want to keep it,
or pointing to a different one (or NULL) if we want to replace it (to
be implemented in a subsequent patch).

In order to simplify things a bit and not to require that the user
passes the 'backing' option to every single block device even when
it's clearly not necessary, this patch allows omitting this option if
the block device being reopened doesn't have a backing file attached
_and_ no default backing file is specified in the image metadata.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

bacd9b87

block: Handle child references in bdrv_reopen_queue() · 8546632e

由 Alberto Garcia 提交于 3月 12, 2019

Children in QMP are specified with BlockdevRef / BlockdevRefOrNull,
which can contain a set of child options, a child reference, or
NULL. In optional attributes like "backing" it can also be missing.

Only the first case (set of child options) is being handled properly
by bdrv_reopen_queue(). This patch deals with all the others.

Here's how these cases should be handled when bdrv_reopen_queue() is
deciding what to do with each child of a BlockDriverState:

1) Set of child options: if the child was implicitly created (i.e
inherits_from points to the parent) then the options are removed
from the parent's options QDict and are passed to the child with
a recursive bdrv_reopen_queue() call. This case was already
working fine.

2) Child reference: there's two possibilites here.

2a) Reference to the current child: if the child was implicitly
created then it is put in the reopen queue, keeping its
current set of options (since this was a child reference
there was no way to specify a different set of options).
If the child is not implicit then it keeps its current set
of options but it is not reopened (and therefore does not
inherit any new option from the parent).

2b) Reference to a different BDS: the current child is not put
in the reopen queue at all. Passing a reference to a
different BDS can be used to replace a child, although at
the moment no driver implements this, so it results in an
error. In any case, the current child is not going to be
reopened (and might in fact disappear if it's replaced)

3) NULL: This is similar to (2b). Although no driver allows this
yet it can be used to detach the current child so it should not
be put in the reopen queue.

4) Missing option: at the moment "backing" is the only case where
this can happen. With "blockdev-add", leaving "backing" out
means that the default backing file is opened. We don't want to
open a new image during reopen, so we require that "backing" is
always present. We'll relax this requirement a bit in the next
patch. If keep_old_opts is true and "backing" is missing then
this behaves like 2a (the current child is reopened).
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

8546632e

block: Add 'keep_old_opts' parameter to bdrv_reopen_queue() · 077e8e20

由 Alberto Garcia 提交于 3月 12, 2019

The bdrv_reopen_queue() function is used to create a queue with
the BDSs that are going to be reopened and their new options. Once
the queue is ready bdrv_reopen_multiple() is called to perform the
operation.

The original options from each one of the BDSs are kept, with the new
options passed to bdrv_reopen_queue() applied on top of them.

For "x-blockdev-reopen" we want a function that behaves much like
"blockdev-add". We want to ignore the previous set of options so that
only the ones actually specified by the user are applied, with the
rest having their default values.

One of the things that we need is a way to tell bdrv_reopen_queue()
whether we want to keep the old set of options or not, and that's what
this patch does. All current callers are setting this new parameter to
true and x-blockdev-reopen will set it to false.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

077e8e20

block: Allow freezing BdrvChild links · 2cad1ebe

由 Alberto Garcia 提交于 3月 12, 2019

Our permission system is useful to define what operations are allowed
on a certain block node and includes things like BLK_PERM_WRITE or
BLK_PERM_RESIZE among others.

One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing
the node that this BdrvChild points to". The exact meaning of this has
never been very clear, but it can be understood as "change any of the
links connected to the node". This can be used to prevent changing a
backing link, but it's too coarse.

This patch adds a new 'frozen' attribute to BdrvChild, which forbids
detaching the link from the node it points to, and new API to freeze
and unfreeze a backing chain.

After this change a few functions can fail, so they need additional
checks.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

2cad1ebe

file-posix: Fix bdrv_open_flags() for snapshot=on · 30855137

由 Kevin Wolf 提交于 3月 11, 2019

Using a different read-only setting for bs->open_flags than for the
flags to the driver's open function is just inconsistent and a bad idea.
After this patch, the temporary snapshot keeps being opened read-only if
read-only=on,snapshot=on is passed.

If we wanted to change this behaviour to make only the orginal image
file read-only, but the temporary overlay read-write (as the comment in
the removed code suggests), that change would have to be made in
bdrv_temp_snapshot_options() (where the comment suggests otherwise).

Addressing this inconsistency before introducing dynamic auto-read-only
is important because otherwise we would immediately try to reopen the
temporary overlay even though the file is already unlinked.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

30855137

block: Make permission changes in reopen less wrong · 69b736e7

由 Kevin Wolf 提交于 3月 05, 2019

The way that reopen interacts with permission changes has one big
problem: Both operations are recursive, and the permissions are changes
for each node in the reopen queue.

For a simple graph that consists just of parent and child,
.bdrv_check_perm will be called twice for the child, once recursively
when adjusting the permissions of parent, and once again when the child
itself is reopened.

Even worse, the first .bdrv_check_perm call happens before
.bdrv_reopen_prepare was called for the child and the second one is
called afterwards.

Making sure that .bdrv_check_perm (and the other permission callbacks)
are called only once is hard. We can cope with multiple calls right now,
but as soon as file-posix gets a dynamic auto-read-only that may need to
open a new file descriptor, we get the additional requirement that all
of them are after the .bdrv_reopen_prepare call.

So reorder things in bdrv_reopen_multiple() to first call
.bdrv_reopen_prepare for all involved nodes and only then adjust
permissions.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

69b736e7

block: Avoid useless local_err · a4615ab3

由 Kevin Wolf 提交于 3月 05, 2019

Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

a4615ab3