1. 10 7月, 2018 1 次提交
    • K
      block: Poll after drain on attaching a node · 4be6a6d1
      Kevin Wolf 提交于
      Commit dcf94a23 ('block: Don't poll in parent drain callbacks')
      removed polling in bdrv_child_cb_drained_begin() on the grounds that the
      original bdrv_drain() already will poll and BdrvChildRole.drained_begin
      calls must not cause graph changes (and therefore must not call
      aio_poll() or the recursion through the graph will break.
      
      This reasoning is correct for calls through bdrv_do_drained_begin().
      However, BdrvChildRole.drained_begin is also called when a node that is
      already in a drained section (i.e. bdrv_do_drained_begin() has already
      returned and therefore can't poll any more) is attached to a new parent.
      In this case, we must explicitly poll to have all requests completed
      before the drained new child can be attached to the parent.
      
      In bdrv_replace_child_noperm(), we know that we're not inside the
      recursion of bdrv_do_drained_begin() because graph changes are not
      allowed there, and bdrv_replace_child_noperm() is a graph change. The
      call of BdrvChildRole.drained_begin() must therefore be followed by a
      BDRV_POLL_WHILE() that waits for the completion of requests.
      Reported-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4be6a6d1
  2. 05 7月, 2018 2 次提交
  3. 29 6月, 2018 2 次提交
    • K
      block: Move bdrv_truncate() implementation to io.c · 3d9f2d2a
      Kevin Wolf 提交于
      This moves the bdrv_truncate() implementation from block.c to block/io.c
      so it can have access to the tracked requests infrastructure.
      
      This involves making refresh_total_sectors() public (in block_int.h).
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      3d9f2d2a
    • K
      block: Convert .bdrv_truncate callback to coroutine_fn · 061ca8a3
      Kevin Wolf 提交于
      bdrv_truncate() is an operation that can block (even for a quite long
      time, depending on the PreallocMode) in I/O paths that shouldn't block.
      Convert it to a coroutine_fn so that we have the infrastructure for
      drivers to make their .bdrv_co_truncate implementation asynchronous.
      
      This change could potentially introduce new race conditions because
      bdrv_truncate() isn't necessarily executed atomically any more. Whether
      this is a problem needs to be evaluated for each block driver that
      supports truncate:
      
      * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
        protocol drivers are trivially safe because they don't actually yield
        yet, so there is no change in behaviour.
      
      * copy-on-read, crypto, raw-format: Essentially just filter drivers that
        pass the request to a child node, no problem.
      
      * qcow2: The implementation modifies metadata, so it needs to hold
        s->lock to be safe with concurrent I/O requests. In order to avoid
        double locking, this requires pulling the locking out into
        preallocate_co() and using qcow2_write_caches() instead of
        bdrv_flush().
      
      * qed: Does a single header update, this is fine without locking.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      061ca8a3
  4. 18 6月, 2018 6 次提交
    • M
      block: Generalize should_update_child() rule · ec9f10fe
      Max Reitz 提交于
      Currently, bdrv_replace_node() refuses to create loops from one BDS to
      itself if the BDS to be replaced is the backing node of the BDS to
      replace it: Say there is a node A and a node B.  Replacing B by A means
      making all references to B point to A.  If B is a child of A (i.e. A has
      a reference to B), that would mean we would have to make this reference
      point to A itself -- so we'd create a loop.
      
      bdrv_replace_node() (through should_update_child()) refuses to do so if
      B is the backing node of A.  There is no reason why we should create
      loops if B is not the backing node of A, though.  The BDS graph should
      never contain loops, so we should always refuse to create them.
      
      If B is a child of A and B is to be replaced by A, we should simply
      leave B in place there because it is the most sensible choice.
      
      A more specific argument would be: Putting filter drivers into the BDS
      graph is basically the same as appending an overlay to a backing chain.
      But the main child BDS of a filter driver is not "backing" but "file",
      so restricting the no-loop rule to backing nodes would fail here.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20180613181823.13618-7-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      ec9f10fe
    • K
      block: Allow graph changes in bdrv_drain_all_begin/end sections · 0f12264e
      Kevin Wolf 提交于
      bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
      did a subtree drain for each of them. This works fine as long as the
      graph is static, but sadly, reality looks different.
      
      If the graph changes so that root nodes are added or removed, we would
      have to compensate for this. bdrv_next() returns each root node only
      once even if it's the root node for multiple BlockBackends or for a
      monitor-owned block driver tree, which would only complicate things.
      
      The much easier and more obviously correct way is to fundamentally
      change the way the functions work: Iterate over all BlockDriverStates,
      no matter who owns them, and drain them individually. Compensation is
      only necessary when a new BDS is created inside a drain_all section.
      Removal of a BDS doesn't require any action because it's gone afterwards
      anyway.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0f12264e
    • K
      block: ignore_bds_parents parameter for drain functions · 6cd5c9d7
      Kevin Wolf 提交于
      In the future, bdrv_drained_all_begin/end() will drain all invidiual
      nodes separately rather than whole subtrees. This means that we don't
      want to propagate the drain to all parents any more: If the parent is a
      BDS, it will already be drained separately. Recursing to all parents is
      unnecessary work and would make it an O(n²) operation.
      
      Prepare the drain function for the changed drain_all by adding an
      ignore_bds_parents parameter to the internal implementation that
      prevents the propagation of the drain to BDS parents. We still (have to)
      propagate it to non-BDS parents like BlockBackends or Jobs because those
      are not drained separately.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6cd5c9d7
    • K
      block: Don't poll in parent drain callbacks · dcf94a23
      Kevin Wolf 提交于
      bdrv_do_drained_begin() is only safe if we have a single
      BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
      that parent callbacks introduce a nested polling loop that could cause
      graph changes while we're traversing the graph.
      
      Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
      node without waiting for its requests to complete. These requests will
      be waited for in the BDRV_POLL_WHILE() call down the call chain.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      dcf94a23
    • K
      block: Drain recursively with a single BDRV_POLL_WHILE() · fe4f0614
      Kevin Wolf 提交于
      Anything can happen inside BDRV_POLL_WHILE(), including graph
      changes that may interfere with its callers (e.g. child list iteration
      in recursive callers of bdrv_do_drained_begin).
      
      Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
      end of bdrv_do_drained_begin() to avoid such effects. The recursion
      happens now inside the loop condition. As the graph can only change
      between bdrv_drain_poll() calls, but not inside of it, doing the
      recursion here is safe.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      fe4f0614
    • K
      block: Really pause block jobs on drain · 89bd0305
      Kevin Wolf 提交于
      We already requested that block jobs be paused in .bdrv_drained_begin,
      but no guarantee was made that the job was actually inactive at the
      point where bdrv_drained_begin() returned.
      
      This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
      uses it to make bdrv_drain_poll() consider block jobs using the node to
      be drained.
      
      For the test case to work as expected, we have to switch from
      block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
      considered active and must be waited for when draining the node.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      89bd0305
  5. 15 6月, 2018 1 次提交
    • M
      block: Add block-specific QDict header · 609f45ea
      Max Reitz 提交于
      There are numerous QDict functions that have been introduced for and are
      used only by the block layer.  Move their declarations into an own
      header file to reflect that.
      
      While qdict_extract_subqdict() is in fact used outside of the block
      layer (in util/qemu-config.c), it is still a function related very
      closely to how the block layer works with nested QDicts, namely by
      sometimes flattening them.  Therefore, its declaration is put into this
      header as well and util/qemu-config.c includes it with a comment stating
      exactly which function it needs.
      Suggested-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-Id: <20180509165530.29561-7-mreitz@redhat.com>
      [Copyright note tweaked, superfluous includes dropped]
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NKevin Wolf <kwolf@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      609f45ea
  6. 11 6月, 2018 2 次提交
  7. 23 5月, 2018 2 次提交
  8. 04 5月, 2018 2 次提交
  9. 20 3月, 2018 3 次提交
  10. 19 3月, 2018 2 次提交
  11. 09 3月, 2018 5 次提交
  12. 03 3月, 2018 5 次提交
  13. 10 2月, 2018 1 次提交
    • E
      block: Simplify bdrv_can_write_zeroes_with_unmap() · e24d813b
      Eric Blake 提交于
      We don't need the can_write_zeroes_with_unmap field in
      BlockDriverInfo, because it is redundant information with
      supported_zero_flags & BDRV_REQ_MAY_UNMAP.  Note that
      BlockDriverInfo and supported_zero_flags are both per-device
      settings, rather than global state about the driver as a
      whole, which means one or both of these bits of information
      can already be conditional.  Let's audit how they were set:
      
      crypto: always setting can_write_ to false is pointless (the
      struct starts life zero-initialized), no use of supported_
      
      nbd: just recently fixed to set can_write_ if supported_
      includes MAY_UNMAP (thus this commit effectively reverts
      bca80059e and solves the problem mentioned there in a more
      global way)
      
      file-posix, iscsi, qcow2: can_write_ is conditional, while
      supported_ was unconditional; but passing MAY_UNMAP would
      fail with ENOTSUP if the condition wasn't met
      
      qed: can_write_ is unconditional, but pwrite_zeroes lacks
      support for MAY_UNMAP and supported_ is not set. Perhaps
      support can be added later (since it would be similar to
      qcow2), but for now claiming false is no real loss
      
      all other drivers: can_write_ is not set, and supported_ is
      either unset or a passthrough
      
      Simplify the code by moving the conditional into
      supported_zero_flags for all drivers, then dropping the
      now-unused BDI field.  For callers that relied on
      bdrv_can_write_zeroes_with_unmap(), we return the same
      per-device settings for drivers that had conditions (no
      observable change in behavior there); and can now return
      true (instead of false) for drivers that support passthrough
      (for example, the commit driver) which gives those drivers
      the same fix as nbd just got in bca80059e.  For callers that
      relied on supported_zero_flags, we now have a few more places
      that can avoid a wasted call to pwrite_zeroes() that will
      just fail with ENOTSUP.
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180126193439.20219-1-eblake@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      e24d813b
  14. 09 2月, 2018 6 次提交