1. 04 6月, 2019 2 次提交
    • M
      block/io: Delay decrementing the quiesce_counter · 5cb2737e
      Max Reitz 提交于
      When ending a drained section, bdrv_do_drained_end() currently first
      decrements the quiesce_counter, and only then actually ends the drain.
      
      The bdrv_drain_invoke(bs, false) call may cause graph changes.  Say the
      graph change involves replacing an existing BB's ("blk") BDS
      (blk_bs(blk)) by @bs.  Let us introducing the following values:
      - bs_oqc = old_quiesce_counter
        (so bs->quiesce_counter == bs_oqc - 1)
      - obs_qc = blk_bs(blk)->quiesce_counter (before bdrv_drain_invoke())
      
      Let us assume there is no blk_pread_unthrottled() involved, so
      blk->quiesce_counter == obs_qc (before bdrv_drain_invoke()).
      
      Now replacing blk_bs(blk) by @bs will reduce blk->quiesce_counter by
      obs_qc (making it 0) and increase it by bs_oqc-1 (making it bs_oqc-1).
      
      bdrv_drain_invoke() returns and we invoke bdrv_parent_drained_end().
      This will decrement blk->quiesce_counter by one, so it would be -1 --
      were there not an assertion against that in blk_root_drained_end().
      
      We therefore have to keep the quiesce_counter up at least until
      bdrv_drain_invoke() returns, so that bdrv_parent_drained_end() does the
      right thing for the parents @bs got during bdrv_drain_invoke().
      
      But let us delay it even further, namely until bdrv_parent_drained_end()
      returns, because then it mirrors bdrv_do_drained_begin(): There, we
      first increment the quiesce_counter, then begin draining the parents,
      and then call bdrv_drain_invoke().  It makes sense to let
      bdrv_do_drained_end() unravel this exactly in reverse.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      5cb2737e
    • V
      block: avoid recursive block_status call if possible · 69f47505
      Vladimir Sementsov-Ogievskiy 提交于
      drv_co_block_status digs bs->file for additional, more accurate search
      for hole inside region, reported as DATA by bs since 5daa74a6.
      
      This accuracy is not free: assume we have qcow2 disk. Actually, qcow2
      knows, where are holes and where is data. But every block_status
      request calls lseek additionally. Assume a big disk, full of
      data, in any iterative copying block job (or img convert) we'll call
      lseek(HOLE) on every iteration, and each of these lseeks will have to
      iterate through all metadata up to the end of file. It's obviously
      ineffective behavior. And for many scenarios we don't need this lseek
      at all.
      
      However, lseek is needed when we have metadata-preallocated image.
      
      So, let's detect metadata-preallocation case and don't dig qcow2's
      protocol file in other cases.
      
      The idea is to compare allocation size in POV of filesystem with
      allocations size in POV of Qcow2 (by refcounts). If allocation in fs is
      significantly lower, consider it as metadata-preallocation case.
      
      102 iotest changed, as our detector can't detect shrinked file as
      metadata-preallocation, which don't seem to be wrong, as with metadata
      preallocation we always have valid file length.
      
      Two other iotests have a slight change in their QMP output sequence:
      Active 'block-commit' returns earlier because the job coroutine yields
      earlier on a blocking operation. This operation is loading the refcount
      blocks in qcow2_detect_metadata_preallocation().
      Suggested-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      69f47505
  2. 20 5月, 2019 1 次提交
  3. 10 5月, 2019 2 次提交
  4. 26 3月, 2019 2 次提交
  5. 22 2月, 2019 1 次提交
  6. 01 2月, 2019 1 次提交
    • K
      block: Fix hangs in synchronous APIs with iothreads · 4720cbee
      Kevin Wolf 提交于
      In the block layer, synchronous APIs are often implemented by creating a
      coroutine that calls the asynchronous coroutine-based implementation and
      then waiting for completion with BDRV_POLL_WHILE().
      
      For this to work with iothreads (more specifically, when the synchronous
      API is called in a thread that is not the home thread of the block
      device, so that the coroutine will run in a different thread), we must
      make sure to call aio_wait_kick() at the end of the operation. Many
      places are missing this, so that BDRV_POLL_WHILE() keeps hanging even if
      the condition has long become false.
      
      Note that bdrv_dec_in_flight() involves an aio_wait_kick() call. This
      corresponds to the BDRV_POLL_WHILE() in the drain functions, but it is
      generally not enough for most other operations because they haven't set
      the return value in the coroutine entry stub yet. To avoid race
      conditions there, we need to kick after setting the return value.
      
      The race window is small enough that the problem doesn't usually surface
      in the common path. However, it does surface and causes easily
      reproducible hangs if the operation can return early before even calling
      bdrv_inc/dec_in_flight, which many of them do (trivial error or no-op
      success paths).
      
      The bug in bdrv_truncate(), bdrv_check() and bdrv_invalidate_cache() is
      slightly different: These functions even neglected to schedule the
      coroutine in the home thread of the node. This avoids the hang, but is
      obviously wrong, too. Fix those to schedule the coroutine in the right
      AioContext in addition to adding aio_wait_kick() calls.
      
      Cc: qemu-stable@nongnu.org
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      4720cbee
  7. 25 9月, 2018 3 次提交
    • K
      block: Use a single global AioWait · cfe29d82
      Kevin Wolf 提交于
      When draining a block node, we recurse to its parent and for subtree
      drains also to its children. A single AIO_WAIT_WHILE() is then used to
      wait for bdrv_drain_poll() to become true, which depends on all of the
      nodes we recursed to. However, if the respective child or parent becomes
      quiescent and calls bdrv_wakeup(), only the AioWait of the child/parent
      is checked, while AIO_WAIT_WHILE() depends on the AioWait of the
      original node.
      
      Fix this by using a single AioWait for all callers of AIO_WAIT_WHILE().
      
      This may mean that the draining thread gets a few more unnecessary
      wakeups because an unrelated operation got completed, but we already
      wake it up when something _could_ have changed rather than only if it
      has certainly changed.
      
      Apart from that, drain is a slow path anyway. In theory it would be
      possible to use wakeups more selectively and still correctly, but the
      gains are likely not worth the additional complexity. In fact, this
      patch is a nice simplification for some places in the code.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      cfe29d82
    • K
      block: Remove aio_poll() in bdrv_drain_poll variants · 4cf077b5
      Kevin Wolf 提交于
      bdrv_drain_poll_top_level() was buggy because it didn't release the
      AioContext lock of the node to be drained before calling aio_poll().
      This way, callbacks called by aio_poll() would possibly take the lock a
      second time and run into a deadlock with a nested AIO_WAIT_WHILE() call.
      
      However, it turns out that the aio_poll() call isn't actually needed any
      more. It was introduced in commit 91af091f, which is effectively
      reverted by this patch. The cases it was supposed to fix are now covered
      by bdrv_drain_poll(), which waits for block jobs to reach a quiescent
      state.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      4cf077b5
    • K
      block: Add missing locking in bdrv_co_drain_bh_cb() · aa1361d5
      Kevin Wolf 提交于
      bdrv_do_drained_begin/end() assume that they are called with the
      AioContext lock of bs held. If we call drain functions from a coroutine
      with the AioContext lock held, we yield and schedule a BH to move out of
      coroutine context. This means that the lock for the home context of the
      coroutine is released and must be re-acquired in the bottom half.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      aa1361d5
  8. 10 7月, 2018 14 次提交
  9. 03 7月, 2018 2 次提交
  10. 29 6月, 2018 4 次提交
  11. 18 6月, 2018 8 次提交
    • K
      block: Allow graph changes in bdrv_drain_all_begin/end sections · 0f12264e
      Kevin Wolf 提交于
      bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
      did a subtree drain for each of them. This works fine as long as the
      graph is static, but sadly, reality looks different.
      
      If the graph changes so that root nodes are added or removed, we would
      have to compensate for this. bdrv_next() returns each root node only
      once even if it's the root node for multiple BlockBackends or for a
      monitor-owned block driver tree, which would only complicate things.
      
      The much easier and more obviously correct way is to fundamentally
      change the way the functions work: Iterate over all BlockDriverStates,
      no matter who owns them, and drain them individually. Compensation is
      only necessary when a new BDS is created inside a drain_all section.
      Removal of a BDS doesn't require any action because it's gone afterwards
      anyway.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0f12264e
    • K
      block: ignore_bds_parents parameter for drain functions · 6cd5c9d7
      Kevin Wolf 提交于
      In the future, bdrv_drained_all_begin/end() will drain all invidiual
      nodes separately rather than whole subtrees. This means that we don't
      want to propagate the drain to all parents any more: If the parent is a
      BDS, it will already be drained separately. Recursing to all parents is
      unnecessary work and would make it an O(n²) operation.
      
      Prepare the drain function for the changed drain_all by adding an
      ignore_bds_parents parameter to the internal implementation that
      prevents the propagation of the drain to BDS parents. We still (have to)
      propagate it to non-BDS parents like BlockBackends or Jobs because those
      are not drained separately.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6cd5c9d7
    • K
      block: Move bdrv_drain_all_begin() out of coroutine context · c8ca33d0
      Kevin Wolf 提交于
      Before we can introduce a single polling loop for all nodes in
      bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
      context like we already do for bdrv_do_drained_begin().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c8ca33d0
    • K
      block: Defer .bdrv_drain_begin callback to polling phase · 0109e7e6
      Kevin Wolf 提交于
      We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
      done with propagating the drain through the graph and are doing the
      single final BDRV_POLL_WHILE().
      
      Just schedule the coroutine with the callback and increase bs->in_flight
      to make sure that the polling phase will wait for it.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0109e7e6
    • K
      block: Don't poll in parent drain callbacks · dcf94a23
      Kevin Wolf 提交于
      bdrv_do_drained_begin() is only safe if we have a single
      BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
      that parent callbacks introduce a nested polling loop that could cause
      graph changes while we're traversing the graph.
      
      Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
      node without waiting for its requests to complete. These requests will
      be waited for in the BDRV_POLL_WHILE() call down the call chain.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      dcf94a23
    • K
      block: Drain recursively with a single BDRV_POLL_WHILE() · fe4f0614
      Kevin Wolf 提交于
      Anything can happen inside BDRV_POLL_WHILE(), including graph
      changes that may interfere with its callers (e.g. child list iteration
      in recursive callers of bdrv_do_drained_begin).
      
      Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
      end of bdrv_do_drained_begin() to avoid such effects. The recursion
      happens now inside the loop condition. As the graph can only change
      between bdrv_drain_poll() calls, but not inside of it, doing the
      recursion here is safe.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      fe4f0614
    • K
      block: Remove bdrv_drain_recurse() · d30b8e64
      Kevin Wolf 提交于
      For bdrv_drain(), recursively waiting for child node requests is
      pointless because we didn't quiesce their parents, so new requests could
      come in anyway. Letting the function work only on a single node makes it
      more consistent.
      
      For subtree drains and drain_all, we already have the recursion in
      bdrv_do_drained_begin(), so the extra recursion doesn't add anything
      either.
      
      Remove the useless code.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      d30b8e64
    • K
      block: Really pause block jobs on drain · 89bd0305
      Kevin Wolf 提交于
      We already requested that block jobs be paused in .bdrv_drained_begin,
      but no guarantee was made that the job was actually inactive at the
      point where bdrv_drained_begin() returned.
      
      This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
      uses it to make bdrv_drain_poll() consider block jobs using the node to
      be drained.
      
      For the test case to work as expected, we have to switch from
      block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
      considered active and must be waited for when draining the node.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      89bd0305