1. 25 9月, 2018 40 次提交
    • K
      test-bdrv-drain: Test draining job source child and parent · d8b3afd5
      Kevin Wolf 提交于
      For the block job drain test, don't only test draining the source and
      the target node, but create a backing chain for the source
      (source_backing <- source <- source_overlay) and test draining each of
      the nodes in it.
      
      When using iothreads, the source node (and therefore the job) is in a
      different AioContext than the drain, which happens from the main
      thread. This way, the main thread waits in AIO_WAIT_WHILE() for the
      iothread to make process and aio_wait_kick() is required to notify it.
      The test validates that calling bdrv_wakeup() for a child or a parent
      node will actually notify AIO_WAIT_WHILE() instead of letting it hang.
      
      Increase the sleep time a bit (to 1 ms) because the test case is racy
      and with the shorter sleep, it didn't reproduce the bug it is supposed
      to test for me under 'rr record -n'.
      
      This was because bdrv_drain_invoke_entry() (in the main thread) was only
      called after the job had already reached the pause point, so we got a
      bdrv_dec_in_flight() from the main thread and the additional
      aio_wait_kick() when the job becomes idle (that we really wanted to test
      here) wasn't even necessary any more to make progress.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      d8b3afd5
    • K
      block: Use a single global AioWait · cfe29d82
      Kevin Wolf 提交于
      When draining a block node, we recurse to its parent and for subtree
      drains also to its children. A single AIO_WAIT_WHILE() is then used to
      wait for bdrv_drain_poll() to become true, which depends on all of the
      nodes we recursed to. However, if the respective child or parent becomes
      quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
      is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
      original node.
      
      Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().
      
      This may mean that the draining thread gets a few more unnecessary
      wakeups because an unrelated operation got completed, but we already
      wake it up when something _could_ have changed rather than only if it
      has certainly changed.
      
      Apart from that, drain is a slow path anyway. In theory it would be
      possible to use wakeups more selectively and still correctly, but the
      gains are likely not worth the additional complexity. In fact, this
      patch is a nice simplification for some places in the code.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      cfe29d82
    • K
      test-bdrv-drain: Fix outdated comments · 5599c162
      Kevin Wolf 提交于
      Commit 89bd0305 changed the test case from using job_sleep_ns() to
      using qemu_co_sleep_ns() instead. Also, block_job_sleep_ns() became
      job_sleep_ns() in commit 5d43e86e.
      
      In both cases, some comments in the test case were not updated. Do that
      now.
      Reported-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      5599c162
    • K
      test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort · d49725af
      Kevin Wolf 提交于
      This adds tests for calling AIO_WAIT_WHILE() in the .commit and .abort
      callbacks. Both reasons why .abort could be called for a single job are
      tested: Either .run or .prepare could return an error.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      d49725af
    • K
      job: Avoid deadlocks in job_completed_txn_abort() · 644f3a29
      Kevin Wolf 提交于
      Amongst others, job_finalize_single() calls the .prepare/.commit/.abort
      callbacks of the individual job driver. Recently, their use was adapted
      for all block jobs so that they involve code calling AIO_WAIT_WHILE()
      now. Such code must be called under the AioContext lock for the
      respective job, but without holding any other AioContext lock.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      644f3a29
    • K
      test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() · ecc1a5c7
      Kevin Wolf 提交于
      This is a regression test for a deadlock that could occur in callbacks
      called from the aio_poll() in bdrv_drain_poll_top_level(). The
      AioContext lock wasn't released and therefore would be taken a second
      time in the callback. This would cause a possible AIO_WAIT_WHILE() in
      the callback to hang.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      ecc1a5c7
    • K
      block: Remove aio_poll() in bdrv_drain_poll variants · 4cf077b5
      Kevin Wolf 提交于
      bdrv_drain_poll_top_level() was buggy because it didn't release the
      AioContext lock of the node to be drained before calling aio_poll().
      This way, callbacks called by aio_poll() would possibly take the lock a
      second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
      
      However, it turns out that the aio_poll() call isn't actually needed any
      more. It was introduced in commit 91af091f, which is effectively
      reverted by this patch. The cases it was supposed to fix are now covered
      by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
      state.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      4cf077b5
    • K
      blockjob: Lie better in child_job_drained_poll() · b5a7a057
      Kevin Wolf 提交于
      Block jobs claim in .drained_poll() that they are in a quiescent state
      as soon as job->deferred_to_main_loop is true. This is obviously wrong,
      they still have a completion BH to run. We only get away with this
      because commit 91af091f added an unconditional aio_poll(false) to the
      drain functions, but this is bypassing the regular drain mechanisms.
      
      However, just removing this and telling that the job is still active
      doesn't work either: The completion callbacks themselves call drain
      functions (directly, or indirectly with bdrv_reopen), so they would
      deadlock then.
      
      As a better lie, tell that the job is active as long as the BH is
      pending, but falsely call it quiescent from the point in the BH when the
      completion callback is called. At this point, nested drain calls won't
      deadlock because they ignore the job, and outer drains will wait for the
      job to really reach a quiescent state because the callback is already
      running.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      b5a7a057
    • K
      block-backend: Decrease in_flight only after callback · 46aaf2a5
      Kevin Wolf 提交于
      Request callbacks can do pretty much anything, including operations that
      will yield from the coroutine (such as draining the backend). In that
      case, a decreased in_flight would be visible to other code and could
      lead to a drain completing while the callback hasn't actually completed
      yet.
      
      Note that reordering these operations forbids calling drain directly
      inside an AIO callback. As Paolo explains, indirectly calling it is
      okay:
      
      - Calling it through a coroutine is okay, because then
        bdrv_drained_begin() goes through bdrv_co_yield_to_drain() and you
        have in_flight=2 when bdrv_co_yield_to_drain() yields, then soon
        in_flight=1 when the aio_co_wake() in the AIO callback completes, then
        in_flight=0 after the bottom half starts.
      
      - Calling it through a bottom half would be okay too, as long as the AIO
        callback remembers to do inc_in_flight/dec_in_flight just like
        bdrv_co_yield_to_drain() and bdrv_co_drain_bh_cb() do
      
      A few more important cases that come to mind:
      
      - A coroutine that yields because of I/O is okay, with a sequence
        similar to bdrv_co_yield_to_drain().
      
      - A coroutine that yields with no I/O pending will correctly decrease
        in_flight to zero before yielding.
      
      - Calling more AIO from the callback won't overflow the counter just
        because of mutual recursion, because AIO functions always yield at
        least once before invoking the callback.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      46aaf2a5
    • K
      block-backend: Fix potential double blk_delete() · 5ca9d21b
      Kevin Wolf 提交于
      blk_unref() first decreases the refcount of the BlockBackend and calls
      blk_delete() if the refcount reaches zero. Requests can still be in
      flight at this point, they are only drained during blk_delete():
      
      At this point, arbitrary callbacks can run. If any callback takes a
      temporary BlockBackend reference, it will first increase the refcount to
      1 and then decrease it to 0 again, triggering another blk_delete(). This
      will cause a use-after-free crash in the outer blk_delete().
      
      Fix it by draining the BlockBackend before decreasing to refcount to 0.
      Assert in blk_ref() that it never takes the first refcount (which would
      mean that the BlockBackend is already being deleted).
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      5ca9d21b
    • K
      block-backend: Add .drained_poll callback · fe5258a5
      Kevin Wolf 提交于
      A bdrv_drain operation must ensure that all parents are quiesced, this
      includes BlockBackends. Otherwise, callbacks called by requests that are
      completed on the BDS layer, but not quite yet on the BlockBackend layer
      could still create new requests.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      fe5258a5
    • K
      block: Add missing locking in bdrv_co_drain_bh_cb() · aa1361d5
      Kevin Wolf 提交于
      bdrv_do_drained_begin/end() assume that they are called with the
      AioContext lock of bs held. If we call drain functions from a coroutine
      with the AioContext lock held, we yield and schedule a BH to move out of
      coroutine context. This means that the lock for the home context of the
      coroutine is released and must be re-acquired in the bottom half.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      aa1361d5
    • K
      test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback · ae23dde9
      Kevin Wolf 提交于
      This is a regression test for a deadlock that occurred in block job
      completion callbacks (via job_defer_to_main_loop) because the AioContext
      lock was taken twice: once in job_finish_sync() and then again in
      job_defer_to_main_loop_bh(). This would cause AIO_WAIT_WHILE() to hang.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      ae23dde9
    • K
      job: Use AIO_WAIT_WHILE() in job_finish_sync() · de0fbe64
      Kevin Wolf 提交于
      job_finish_sync() needs to release the AioContext lock of the job before
      calling aio_poll(). Otherwise, callbacks called by aio_poll() would
      possibly take the lock a second time and run into a deadlock with a
      nested AIO_WAIT_WHILE() call.
      
      Also, job_drain() without aio_poll() isn't necessarily enough to make
      progress on a job, it could depend on bottom halves to be executed.
      
      Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
      that solves both of these problems.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      de0fbe64
    • K
      test-blockjob: Acquire AioContext around job_cancel_sync() · 30c070a5
      Kevin Wolf 提交于
      All callers in QEMU proper hold the AioContext lock when calling
      job_finish_sync(). test-blockjob should do the same when it calls the
      function indirectly through job_cancel_sync().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      30c070a5
    • K
      test-bdrv-drain: Drain with block jobs in an I/O thread · f62c1729
      Kevin Wolf 提交于
      This extends the existing drain test with a block job to include
      variants where the block job runs in a different AioContext.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      f62c1729
    • K
      aio-wait: Increase num_waiters even in home thread · 48657448
      Kevin Wolf 提交于
      Even if AIO_WAIT_WHILE() is called in the home context of the
      AioContext, we still want to allow the condition to change depending on
      other threads as long as they kick the AioWait. Specfically block jobs
      can be running in an I/O thread and should then be able to kick a drain
      in the main loop context.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      48657448
    • K
      blockjob: Wake up BDS when job becomes idle · 34dc97b9
      Kevin Wolf 提交于
      In the context of draining a BDS, the .drained_poll callback of block
      jobs is called. If this returns true (i.e. there is still some activity
      pending), the drain operation may call aio_poll() with blocking=true to
      wait for completion.
      
      As soon as the pending activity is completed and the job finally arrives
      in a quiescent state (i.e. its coroutine either yields with busy=false
      or terminates), the block job must notify the aio_poll() loop to wake
      up, otherwise we get a deadlock if both are running in different
      threads.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      34dc97b9
    • K
      job: Fix missing locking due to mismerge · d1756c78
      Kevin Wolf 提交于
      job_completed() had a problem with double locking that was recently
      fixed independently by two different commits:
      
      "job: Fix nested aio_poll() hanging in job_txn_apply"
      "jobs: add exit shim"
      
      One fix removed the first aio_context_acquire(), the other fix removed
      the other one. Now we have a bug again and the code is run without any
      locking.
      
      Add it back in one of the places.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJohn Snow <jsnow@redhat.com>
      d1756c78
    • F
      job: Fix nested aio_poll() hanging in job_txn_apply · 49880165
      Fam Zheng 提交于
      All callers have acquired ctx already. Doing that again results in
      aio_poll() hang. This fixes the problem that a BDRV_POLL_WHILE() in the
      callback cannot make progress because ctx is recursively locked, for
      example, when drive-backup finishes.
      
      There are two callers of job_finalize():
      
          fam@lemon:~/work/qemu [master]$ git grep -w -A1 '^\s*job_finalize'
          blockdev.c:    job_finalize(&job->job, errp);
          blockdev.c-    aio_context_release(aio_context);
          --
          job-qmp.c:    job_finalize(job, errp);
          job-qmp.c-    aio_context_release(aio_context);
          --
          tests/test-blockjob.c:    job_finalize(&job->job, &error_abort);
          tests/test-blockjob.c-    assert(job->job.status == JOB_STATUS_CONCLUDED);
      
      Ignoring the test, it's easy to see both callers to job_finalize (and
      job_do_finalize) have acquired the context.
      
      Cc: qemu-stable@nongnu.org
      Reported-by: NGu Nini <ngu@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      49880165
    • S
      util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb · 6808ae04
      Sergio Lopez 提交于
      AIO Coroutines shouldn't by managed by an AioContext different than the
      one assigned when they are created. aio_co_enter avoids entering a
      coroutine from a different AioContext, calling aio_co_schedule instead.
      
      Scheduled coroutines are then entered by co_schedule_bh_cb using
      qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
      current AioContext obtained with qemu_get_current_aio_context.
      Eventually, co->ctx will be set to the AioContext passed as an argument
      to qemu_aio_coroutine_enter.
      
      This means that, if an IO Thread's AioConext is being processed by the
      Main Thread (due to aio_poll being called with a BDS AioContext, as it
      happens in AIO_WAIT_WHILE among other places), the AioContext from some
      coroutines may be wrongly replaced with the one from the Main Thread.
      
      This is the root cause behind some crashes, mainly triggered by the
      drain code at block/io.c. The most common are these abort and failed
      assertion:
      
      util/async.c:aio_co_schedule
      456     if (scheduled) {
      457         fprintf(stderr,
      458                 "%s: Co-routine was already scheduled in '%s'\n",
      459                 __func__, scheduled);
      460         abort();
      461     }
      
      util/qemu-coroutine-lock.c:
      286     assert(mutex->holder == self);
      
      But it's also known to cause random errors at different locations, and
      even SIGSEGV with broken coroutine backtraces.
      
      By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
      pass the correct AioContext as an argument, making sure co->ctx is not
      wrongly altered.
      Signed-off-by: NSergio Lopez <slp@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6808ae04
    • A
      qemu-iotests: Test snapshot=on with nonexistent TMPDIR · 6a7014ef
      Alberto Garcia 提交于
      We just fixed a bug that was causing a use-after-free when QEMU was
      unable to create a temporary snapshot. This is a test case for this
      scenario.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6a7014ef
    • A
      block: Fix use after free error in bdrv_open_inherit() · 8961be33
      Alberto Garcia 提交于
      When a block device is opened with BDRV_O_SNAPSHOT and the
      bdrv_append_temp_snapshot() call fails then the error code path tries
      to unref the already destroyed 'options' QDict.
      
      This can be reproduced easily by setting TMPDIR to a location where
      the QEMU process can't write:
      
         $ TMPDIR=/nonexistent $QEMU -drive driver=null-co,snapshot=on
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      8961be33
    • S
      block/linux-aio: acquire AioContext before qemu_laio_process_completions · e091f0e9
      Sergio Lopez 提交于
      In qemu_laio_process_completions_and_submit, the AioContext is acquired
      before the ioq_submit iteration and after qemu_laio_process_completions,
      but the latter is not thread safe either.
      
      This change avoids a number of random crashes when the Main Thread and
      an IO Thread collide processing completions for the same AioContext.
      This is an example of such crash:
      
       - The IO Thread is trying to acquire the AioContext at aio_co_enter,
         which evidences that it didn't lock it before:
      
      Thread 3 (Thread 0x7fdfd8bd8700 (LWP 36743)):
       #0  0x00007fdfe0dd542d in __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
       #1  0x00007fdfe0dd0de6 in _L_lock_870 () at /lib64/libpthread.so.0
       #2  0x00007fdfe0dd0cdf in __GI___pthread_mutex_lock (mutex=mutex@entry=0x5631fde0e6c0)
          at ../nptl/pthread_mutex_lock.c:114
       #3  0x00005631fc0603a7 in qemu_mutex_lock_impl (mutex=0x5631fde0e6c0, file=0x5631fc23520f "util/async.c", line=511) at util/qemu-thread-posix.c:66
       #4  0x00005631fc05b558 in aio_co_enter (ctx=0x5631fde0e660, co=0x7fdfcc0c2b40) at util/async.c:493
       #5  0x00005631fc05b5ac in aio_co_wake (co=<optimized out>) at util/async.c:478
       #6  0x00005631fbfc51ad in qemu_laio_process_completion (laiocb=<optimized out>) at block/linux-aio.c:104
       #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
          at block/linux-aio.c:222
       #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
          at block/linux-aio.c:237
       #9  0x00005631fc05d978 in aio_dispatch_handlers (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:406
       #10 0x00005631fc05e3ea in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=true)
          at util/aio-posix.c:693
       #11 0x00005631fbd7ad96 in iothread_run (opaque=0x5631fde0e1c0) at iothread.c:64
       #12 0x00007fdfe0dcee25 in start_thread (arg=0x7fdfd8bd8700) at pthread_create.c:308
       #13 0x00007fdfe0afc34d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
      
       - The Main Thread is also processing completions from the same
         AioContext, and crashes due to failed assertion at util/iov.c:78:
      
      Thread 1 (Thread 0x7fdfeb5eac80 (LWP 36740)):
       #0  0x00007fdfe0a391f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
       #1  0x00007fdfe0a3a8e8 in __GI_abort () at abort.c:90
       #2  0x00007fdfe0a32266 in __assert_fail_base (fmt=0x7fdfe0b84e68 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset")
          at assert.c:92
       #3  0x00007fdfe0a32312 in __GI___assert_fail (assertion=assertion@entry=0x5631fc238ccb "offset == 0", file=file@entry=0x5631fc23698e "util/iov.c", line=line@entry=78, function=function@entry=0x5631fc236adc <__PRETTY_FUNCTION__.15220> "iov_memset") at assert.c:101
       #4  0x00005631fc065287 in iov_memset (iov=<optimized out>, iov_cnt=<optimized out>, offset=<optimized out>, offset@entry=65536, fillc=fillc@entry=0, bytes=15515191315812405248) at util/iov.c:78
       #5  0x00005631fc065a63 in qemu_iovec_memset (qiov=<optimized out>, offset=offset@entry=65536, fillc=fillc@entry=0, bytes=<optimized out>) at util/iov.c:410
       #6  0x00005631fbfc5178 in qemu_laio_process_completion (laiocb=0x7fdd920df630) at block/linux-aio.c:88
       #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
          at block/linux-aio.c:222
       #8  0x00005631fbfc5499 in qemu_laio_process_completions_and_submit (s=0x7fdfc0297670)
          at block/linux-aio.c:237
       #9  0x00005631fbfc54ed in qemu_laio_poll_cb (opaque=<optimized out>) at block/linux-aio.c:272
       #10 0x00005631fc05d85e in run_poll_handlers_once (ctx=ctx@entry=0x5631fde0e660) at util/aio-posix.c:497
       #11 0x00005631fc05e2ca in aio_poll (blocking=false, ctx=0x5631fde0e660) at util/aio-posix.c:574
       #12 0x00005631fc05e2ca in aio_poll (ctx=0x5631fde0e660, blocking=blocking@entry=false)
          at util/aio-posix.c:604
       #13 0x00005631fbfcb8a3 in bdrv_do_drained_begin (ignore_parent=<optimized out>, recursive=<optimized out>, bs=<optimized out>) at block/io.c:273
       #14 0x00005631fbfcb8a3 in bdrv_do_drained_begin (bs=0x5631fe8b6200, recursive=<optimized out>, parent=0x0, ignore_bds_parents=<optimized out>, poll=<optimized out>) at block/io.c:390
       #15 0x00005631fbfbcd2e in blk_drain (blk=0x5631fe83ac80) at block/block-backend.c:1590
       #16 0x00005631fbfbe138 in blk_remove_bs (blk=blk@entry=0x5631fe83ac80) at block/block-backend.c:774
       #17 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:401
       #18 0x00005631fbfbe3d6 in blk_unref (blk=0x5631fe83ac80) at block/block-backend.c:449
       #19 0x00005631fbfc9a69 in commit_complete (job=0x5631fe8b94b0, opaque=0x7fdfcc1bb080)
          at block/commit.c:92
       #20 0x00005631fbf7d662 in job_defer_to_main_loop_bh (opaque=0x7fdfcc1b4560) at job.c:973
       #21 0x00005631fc05ad41 in aio_bh_poll (bh=0x7fdfcc01ad90) at util/async.c:90
       #22 0x00005631fc05ad41 in aio_bh_poll (ctx=ctx@entry=0x5631fddffdb0) at util/async.c:118
       #23 0x00005631fc05e210 in aio_dispatch (ctx=0x5631fddffdb0) at util/aio-posix.c:436
       #24 0x00005631fc05ac1e in aio_ctx_dispatch (source=<optimized out>, callback=<optimized out>, user_data=<optimized out>) at util/async.c:261
       #25 0x00007fdfeaae44c9 in g_main_context_dispatch (context=0x5631fde00140) at gmain.c:3201
       #26 0x00007fdfeaae44c9 in g_main_context_dispatch (context=context@entry=0x5631fde00140) at gmain.c:3854
       #27 0x00005631fc05d503 in main_loop_wait () at util/main-loop.c:215
       #28 0x00005631fc05d503 in main_loop_wait (timeout=<optimized out>) at util/main-loop.c:238
       #29 0x00005631fc05d503 in main_loop_wait (nonblocking=nonblocking@entry=0) at util/main-loop.c:497
       #30 0x00005631fbd81412 in main_loop () at vl.c:1866
       #31 0x00005631fbc18ff3 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>)
          at vl.c:4647
      
       - A closer examination shows that s->io_q.in_flight appears to have
         gone backwards:
      
      (gdb) frame 7
       #7  0x00005631fbfc523c in qemu_laio_process_completions (s=s@entry=0x7fdfc0297670)
          at block/linux-aio.c:222
      222	            qemu_laio_process_completion(laiocb);
      (gdb) p s
      $2 = (LinuxAioState *) 0x7fdfc0297670
      (gdb) p *s
      $3 = {aio_context = 0x5631fde0e660, ctx = 0x7fdfeb43b000, e = {rfd = 33, wfd = 33}, io_q = {plugged = 0,
          in_queue = 0, in_flight = 4294967280, blocked = false, pending = {sqh_first = 0x0,
            sqh_last = 0x7fdfc0297698}}, completion_bh = 0x7fdfc0280ef0, event_idx = 21, event_max = 241}
      (gdb) p/x s->io_q.in_flight
      $4 = 0xfffffff0
      Signed-off-by: NSergio Lopez <slp@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e091f0e9
    • K
      qemu-iotests: Test commit with top-node/base-node · d57177a4
      Kevin Wolf 提交于
      This adds some tests for block-commit with the new options top-node and
      base-node (taking node names) instead of top and base (taking file
      names).
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      d57177a4
    • K
      commit: Add top-node/base-node options · 3c605f40
      Kevin Wolf 提交于
      The block-commit QMP command required specifying the top and base nodes
      of the commit jobs using the file name of that node. While this works
      in simple cases (local files with absolute paths), the file names
      generated for more complicated setups can be hard to predict.
      
      The block-commit command has more problems than just this, so we want to
      replace it altogether in the long run, but libvirt needs a reliable way
      to address nodes now. So we don't want to wait for a new, cleaner
      command, but just add the minimal thing needed right now.
      
      This adds two new options top-node and base-node to the command, which
      allow specifying node names instead. They are mutually exclusive with
      the old options.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      3c605f40
    • J
      blockdev: document transactional shortcomings · 66da04dd
      John Snow 提交于
      Presently only the backup job really guarantees what one would consider
      transactional semantics. To guard against someone helpfully adding them
      in the future, document that there are shortcomings in the model that
      would need to be audited at that time.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20180906130225.5118-17-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      66da04dd
    • J
      block/backup: qapi documentation fixup · dfaff2c3
      John Snow 提交于
      Fix documentation to match the other jobs amended for 3.1.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-16-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      dfaff2c3
    • J
      qapi/block-stream: expose new job properties · 241ca1ab
      John Snow 提交于
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-15-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      241ca1ab
    • J
      qapi/block-mirror: expose new job properties · a6b58ade
      John Snow 提交于
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-14-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      a6b58ade
    • J
      qapi/block-commit: expose new job properties · 96fbf534
      John Snow 提交于
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-13-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      96fbf534
    • J
      jobs: remove .exit callback · ccbfb331
      John Snow 提交于
      Now that all of the jobs use the component finalization callbacks,
      there's no use for the heavy-hammer .exit callback anymore.
      
      job_exit becomes a glorified type shim so that we can call
      job_completed from aio_bh_schedule_oneshot.
      
      Move these three functions down into job.c to eliminate a
      forward reference.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-12-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      ccbfb331
    • J
      tests/test-blockjob-txn: move .exit to .clean · e4dad427
      John Snow 提交于
      The exit callback in this test actually only performs cleanup.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-11-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e4dad427
    • J
      tests/test-blockjob: remove exit callback · 977d26fd
      John Snow 提交于
      We remove the exit callback and the completed boolean along with it.
      We can simulate it just fine by waiting for the job to defer to the
      main loop, and then giving it one final kick to get the main loop
      portion to run.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-10-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      977d26fd
    • J
      tests/blockjob: replace Blockjob with Job · 0cc4643b
      John Snow 提交于
      These tests don't actually test blockjobs anymore, they test
      generic Job lifetimes. Change the types accordingly.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-9-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      0cc4643b
    • J
      block/stream: refactor stream to use job callbacks · 1b57488a
      John Snow 提交于
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-8-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      1b57488a
    • J
      block/mirror: conservative mirror_exit refactor · 737efc1e
      John Snow 提交于
      For purposes of minimum code movement, refactor the mirror_exit
      callback to use the post-finalization callbacks in a trivial way.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20180906130225.5118-7-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      [mreitz: Added comment for the mirror_exit() function]
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      737efc1e
    • J
      block/mirror: don't install backing chain on abort · c2924cea
      John Snow 提交于
      In cases where we abort the block/mirror job, there's no point in
      installing the new backing chain before we finish aborting.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20180906130225.5118-6-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      c2924cea
    • J
      block/commit: refactor commit to use job callbacks · 22dffcbe
      John Snow 提交于
      Use the component callbacks; prepare, abort, and clean.
      
      NB: prepare is only called when the job has not yet failed;
      and abort can be called after prepare.
      
      complete -> prepare -> abort -> clean
      complete -> abort -> clean
      
      During refactor, a potential problem with bdrv_drop_intermediate
      was identified, the patched behavior is no worse than the pre-patch
      behavior, so leave a FIXME for now to be fixed in a future patch.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-5-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      22dffcbe
    • J
      block/stream: add block job creation flags · cf6320df
      John Snow 提交于
      Add support for taking and passing forward job creation flags.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Message-id: 20180906130225.5118-4-jsnow@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      cf6320df