• M
    block: Introduce BdrvChild.parent_quiesce_counter · 804db8ea
    Max Reitz 提交于
    Commit 5cb2737e laid out why
    bdrv_do_drained_end() must decrement the quiesce_counter after
    bdrv_drain_invoke().  It did not give a very good reason why it has to
    happen after bdrv_parent_drained_end(), instead only claiming symmetry
    to bdrv_do_drained_begin().
    
    It turns out that delaying it for so long is wrong.
    
    Situation: We have an active commit job (i.e. a mirror job) from top to
    base for the following graph:
    
                      filter
                        |
                      [file]
                        |
                        v
    top --[backing]--> base
    
    Now the VM is closed, which results in the job being cancelled and a
    bdrv_drain_all() happening pretty much simultaneously.
    
    Beginning the drain means the job is paused once whenever one of its
    nodes is quiesced.  This is reversed when the drain ends.
    
    With how the code currently is, after base's drain ends (which means
    that it will have unpaused the job once), its quiesce_counter remains at
    1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
    some reason or another, undraining filter causes the job to be kicked
    and enter mirror_exit_common(), where it proceeds to invoke
    block_job_remove_all_bdrv().
    
    Now base will be detached from the job.  Because its quiesce_counter is
    still 1, it will unpause the job once more.  So in total, undraining
    base will unpause the job twice.  Eventually, this will lead to the
    job's pause_count going negative -- well, it would, were there not an
    assertion against this, which crashes qemu.
    
    The general problem is that if in bdrv_parent_drained_end() we undrain
    parent A, and then undrain parent B, which then leads to A detaching the
    child, bdrv_replace_child_noperm() will undrain A as if we had not done
    so yet; that is, one time too many.
    
    It follows that we cannot decrement the quiesce_counter after invoking
    bdrv_parent_drained_end().
    
    Unfortunately, decrementing it before bdrv_parent_drained_end() would be
    wrong, too.  Imagine the above situation in reverse: Undraining A leads
    to B detaching the child.  If we had already decremented the
    quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
    B one time too little; because it expects bdrv_parent_drained_end() to
    issue this undrain.  But bdrv_parent_drained_end() won't do that,
    because B is no longer a parent.
    
    Therefore, we have to do something else.  This patch opts for
    introducing a second quiesce_counter that counts how many times a
    child's parent has been quiesced (though c->role->drained_*).  With
    that, bdrv_replace_child_noperm() just has to undrain the parent exactly
    that many times when removing a child, and it will always be right.
    Signed-off-by: NMax Reitz <mreitz@redhat.com>
    Signed-off-by: NKevin Wolf <kwolf@redhat.com>
    804db8ea
block.c 197.2 KB