1. 30 4月, 2019 1 次提交
  2. 13 3月, 2019 2 次提交
    • A
      block: Add a 'mutable_opts' field to BlockDriver · 8a2ce0bc
      Alberto Garcia 提交于
      If we reopen a BlockDriverState and there is an option that is present
      in bs->options but missing from the new set of options then we have to
      return an error unless the driver is able to reset it to its default
      value.
      
      This patch adds a new 'mutable_opts' field to BlockDriver. This is
      a list of runtime options that can be modified during reopen. If an
      option in this list is unspecified on reopen then it must be reset (or
      return an error).
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      8a2ce0bc
    • A
      block: Allow freezing BdrvChild links · 2cad1ebe
      Alberto Garcia 提交于
      Our permission system is useful to define what operations are allowed
      on a certain block node and includes things like BLK_PERM_WRITE or
      BLK_PERM_RESIZE among others.
      
      One of the permissions is BLK_PERM_GRAPH_MOD which allows "changing
      the node that this BdrvChild points to". The exact meaning of this has
      never been very clear, but it can be understood as "change any of the
      links connected to the node". This can be used to prevent changing a
      backing link, but it's too coarse.
      
      This patch adds a new 'frozen' attribute to BdrvChild, which forbids
      detaching the link from the node it points to, and new API to freeze
      and unfreeze a backing chain.
      
      After this change a few functions can fail, so they need additional
      checks.
      Signed-off-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2cad1ebe
  3. 08 3月, 2019 2 次提交
  4. 25 2月, 2019 6 次提交
    • M
      block: Purify .bdrv_refresh_filename() · 998b3a1e
      Max Reitz 提交于
      Currently, BlockDriver.bdrv_refresh_filename() is supposed to both
      refresh the filename (BDS.exact_filename) and set BDS.full_open_options.
      Now that we have generic code in the central bdrv_refresh_filename() for
      creating BDS.full_open_options, we can drop the latter part from all
      BlockDriver.bdrv_refresh_filename() implementations.
      
      This also means that we can drop all of the existing default code for
      this from the global bdrv_refresh_filename() itself.
      
      Furthermore, we now have to call BlockDriver.bdrv_refresh_filename()
      after having set BDS.full_open_options, because the block driver's
      implementation should now be allowed to depend on BDS.full_open_options
      being set correctly.
      
      Finally, with this patch we can drop the @options parameter from
      BlockDriver.bdrv_refresh_filename(); also, add a comment on this
      function's purpose in block/block_int.h while touching its interface.
      
      This completely obsoletes blklogwrite's implementation of
      .bdrv_refresh_filename().
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20190201192935.18394-25-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      998b3a1e
    • M
      block: Add BlockDriver.bdrv_gather_child_options · abc521a9
      Max Reitz 提交于
      Some follow-up patches will rework the way bs->full_open_options is
      refreshed in bdrv_refresh_filename(). The new implementation will remove
      the need for the block drivers' bdrv_refresh_filename() implementations
      to set bs->full_open_options; instead, it will be generic and use static
      information from each block driver.
      
      However, by implementing bdrv_gather_child_options(), block drivers will
      still be able to override the way the full_open_options of their
      children are incorporated into their own.
      
      We need to implement this function for VMDK because we have to prevent
      the generic implementation from gathering the options of all children:
      It is not possible to specify options for the extents through the
      runtime options.
      
      For quorum, the child names that would be used by the generic
      implementation and the ones that we actually (currently) want to use
      differ. See quorum_gather_child_options() for more information.
      
      Note that both of these are cases which are not ideal: In case of VMDK
      it would probably be nice to be able to specify options for all extents.
      In case of quorum, the current runtime option structure is simply broken
      and needs to be fixed (but that is left for another patch).
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20190201192935.18394-23-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      abc521a9
    • M
      block: Add strong_runtime_opts to BlockDriver · 2654267c
      Max Reitz 提交于
      This new field can be set by block drivers to list the runtime options
      they accept that may influence the contents of the respective BDS. As of
      a follow-up patch, this list will be used by the common
      bdrv_refresh_filename() implementation to decide which options to put
      into BDS.full_open_options (and consequently whether a JSON filename has
      to be created), thus freeing the drivers of having to implement that
      logic themselves.
      
      Additionally, this patch adds the field to all of the block drivers that
      need it and sets it accordingly.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20190201192935.18394-22-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      2654267c
    • M
      block: Add bdrv_dirname() · 1e89d0f9
      Max Reitz 提交于
      This function may be implemented by block drivers to derive a directory
      name from a BDS. Concatenating this g_free()-able string with a relative
      filename must result in a valid (not necessarily existing) filename, so
      this is a function that should generally be not implemented by format
      drivers, because this is protocol-specific.
      
      If a BDS's driver does not implement this function, bdrv_dirname() will
      fall through to the BDS's file if it exists. If it does not, the
      exact_filename field will be used to generate a directory name.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20190201192935.18394-15-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      1e89d0f9
    • M
      block: Add BDS.auto_backing_file · 998c2019
      Max Reitz 提交于
      If the backing file is overridden, this most probably does change the
      guest-visible data of a BDS.  Therefore, we will need to consider this
      in bdrv_refresh_filename().
      
      To see whether it has been overridden, we might want to compare
      bs->backing_file and bs->backing->bs->filename.  However,
      bs->backing_file is changed by bdrv_set_backing_hd() (which is just used
      to change the backing child at runtime, without modifying the image
      header), so bs->backing_file most of the time simply contains a copy of
      bs->backing->bs->filename anyway, so it is useless for such a
      comparison.
      
      This patch adds an auto_backing_file BDS field which contains the
      backing file path as indicated by the image header, which is not changed
      by bdrv_set_backing_hd().
      
      Because of bdrv_refresh_filename() magic, however, a BDS's filename may
      differ from what has been specified during bdrv_open().  Then, the
      comparison between bs->auto_backing_file and bs->backing->bs->filename
      may fail even though bs->backing was opened from bs->auto_backing_file.
      To mitigate this, we can copy the real BDS's filename (after the whole
      bdrv_open() and bdrv_refresh_filename() process) into
      bs->auto_backing_file, if we know the former has been opened based on
      the latter.  This is only possible if no options modifying the backing
      file's behavior have been specified, though.  To simplify things, this
      patch only copies the filename from the backing file if no options have
      been specified for it at all.
      
      Furthermore, there are cases where an overlay is created by qemu which
      already contains a BDS's filename (e.g. in blockdev-snapshot-sync).  We
      do not need to worry about updating the overlay's bs->auto_backing_file
      there, because we actually wrote a post-bdrv_refresh_filename() filename
      into the image header.
      
      So all in all, there will be false negatives where (as of a future
      patch) bdrv_refresh_filename() will assume that the backing file differs
      from what was specified in the image header, even though it really does
      not.  However, these cases should be limited to where (1) the user
      actually did override something in the backing chain (e.g. by specifying
      options for the backing file), or (2) the user executed a QMP command to
      change some node's backing file (e.g. change-backing-file or
      block-commit with @backing-file given) where the given filename does not
      happen to coincide with qemu's idea of the backing BDS's filename.
      
      Then again, (1) really is limited to -drive.  With -blockdev or
      blockdev-add, you have to adhere to the schema, so a user cannot give
      partial "unimportant" options (e.g. by just setting backing.node-name
      and leaving the rest to the image header).  Therefore, trying to fix
      this would mean trying to fix something for -drive only.
      
      To improve on (2), we would need a full infrastructure to "canonicalize"
      an arbitrary filename (+ options), so it can be compared against
      another.  That seems a bit over the top, considering that filenames
      nowadays are there mostly for the user's entertainment.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20190201192935.18394-5-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      998c2019
    • V
      block: fix bdrv_check_perm for non-tree subgraph · f962e961
      Vladimir Sementsov-Ogievskiy 提交于
      bdrv_check_perm in it's recursion checks each node in context of new
      permissions for one parent, because of nature of DFS. It works well,
      while children subgraph of top-most updated node is a tree, i.e. it
      doesn't have any kind of loops. But if we have a loop (not oriented,
      of course), i.e. we have two different ways from top-node to some
      child-node, then bdrv_check_perm will do wrong thing:
      
        top
        | \
        |  |
        v  v
        A  B
        |  |
        v  v
        node
      
      It will once check new permissions of node in context of new A
      permissions and old B permissions and once visa-versa. It's a wrong way
      and may lead to corruption of permission system. We may start with
      no-permissions and all-shared for both A->node and B->node relations
      and finish up with non shared write permission for both ways.
      
      The following commit will add a test, which shows this bug.
      
      To fix this situation, let's really set BdrvChild permissions during
      bdrv_check_perm procedure. And we are happy here, as check-perm is
      already written in transaction manner, so we just need to restore
      backed-up permissions in _abort.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f962e961
  5. 12 2月, 2019 1 次提交
  6. 30 10月, 2018 1 次提交
  7. 25 9月, 2018 4 次提交
  8. 10 7月, 2018 3 次提交
    • F
      block: Use uint64_t for BdrvTrackedRequest byte fields · 22931a15
      Fam Zheng 提交于
      This matches the types used for bytes in the rest parts of block layer.
      In the case of bdrv_co_truncate, new_bytes can be the image size which
      probably doesn't fit in a 32 bit int.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      22931a15
    • V
      block: split flags in copy_range · 67b51fb9
      Vladimir Sementsov-Ogievskiy 提交于
      Pass read flags and write flags separately. This is needed to handle
      coming BDRV_REQ_NO_SERIALISING clearly in following patches.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      67b51fb9
    • K
      block: Poll after drain on attaching a node · 4be6a6d1
      Kevin Wolf 提交于
      Commit dcf94a23 ('block: Don't poll in parent drain callbacks')
      removed polling in bdrv_child_cb_drained_begin() on the grounds that the
      original bdrv_drain() already will poll and BdrvChildRole.drained_begin
      calls must not cause graph changes (and therefore must not call
      aio_poll() or the recursion through the graph will break.
      
      This reasoning is correct for calls through bdrv_do_drained_begin().
      However, BdrvChildRole.drained_begin is also called when a node that is
      already in a drained section (i.e. bdrv_do_drained_begin() has already
      returned and therefore can't poll any more) is attached to a new parent.
      In this case, we must explicitly poll to have all requests completed
      before the drained new child can be attached to the parent.
      
      In bdrv_replace_child_noperm(), we know that we're not inside the
      recursion of bdrv_do_drained_begin() because graph changes are not
      allowed there, and bdrv_replace_child_noperm() is a graph change. The
      call of BdrvChildRole.drained_begin() must therefore be followed by a
      BDRV_POLL_WHILE() that waits for the completion of requests.
      Reported-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4be6a6d1
  9. 29 6月, 2018 3 次提交
    • K
      block: Use tracked request for truncate · 1bc5f09f
      Kevin Wolf 提交于
      When growing an image, block drivers (especially protocol drivers) may
      initialise the newly added area. I/O requests to the same area need to
      wait for this initialisation to be completed so that data writes don't
      get overwritten and reads don't read uninitialised data.
      
      To avoid overhead in the fast I/O path by adding new locking in the
      protocol drivers and to restrict the impact to requests that actually
      touch the new area, reuse the existing tracked request infrastructure in
      block/io.c and mark all discard requests as serialising.
      
      With this change, it is safe for protocol drivers to make
      .bdrv_co_truncate actually asynchronous.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      1bc5f09f
    • K
      block: Move bdrv_truncate() implementation to io.c · 3d9f2d2a
      Kevin Wolf 提交于
      This moves the bdrv_truncate() implementation from block.c to block/io.c
      so it can have access to the tracked requests infrastructure.
      
      This involves making refresh_total_sectors() public (in block_int.h).
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      3d9f2d2a
    • K
      block: Convert .bdrv_truncate callback to coroutine_fn · 061ca8a3
      Kevin Wolf 提交于
      bdrv_truncate() is an operation that can block (even for a quite long
      time, depending on the PreallocMode) in I/O paths that shouldn't block.
      Convert it to a coroutine_fn so that we have the infrastructure for
      drivers to make their .bdrv_co_truncate implementation asynchronous.
      
      This change could potentially introduce new race conditions because
      bdrv_truncate() isn't necessarily executed atomically any more. Whether
      this is a problem needs to be evaluated for each block driver that
      supports truncate:
      
      * file-posix/win32, gluster, iscsi, nfs, rbd, ssh, sheepdog: The
        protocol drivers are trivially safe because they don't actually yield
        yet, so there is no change in behaviour.
      
      * copy-on-read, crypto, raw-format: Essentially just filter drivers that
        pass the request to a child node, no problem.
      
      * qcow2: The implementation modifies metadata, so it needs to hold
        s->lock to be safe with concurrent I/O requests. In order to avoid
        double locking, this requires pulling the locking out into
        preallocate_co() and using qcow2_write_caches() instead of
        bdrv_flush().
      
      * qed: Does a single header update, this is fine without locking.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      061ca8a3
  10. 18 6月, 2018 4 次提交
    • M
      block/mirror: Add copy mode QAPI interface · 481debaa
      Max Reitz 提交于
      This patch allows the user to specify whether to use active or only
      background mode for mirror block jobs.  Currently, this setting will
      remain constant for the duration of the entire block job.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Message-id: 20180613181823.13618-14-mreitz@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      481debaa
    • K
      block: Allow graph changes in bdrv_drain_all_begin/end sections · 0f12264e
      Kevin Wolf 提交于
      bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
      did a subtree drain for each of them. This works fine as long as the
      graph is static, but sadly, reality looks different.
      
      If the graph changes so that root nodes are added or removed, we would
      have to compensate for this. bdrv_next() returns each root node only
      once even if it's the root node for multiple BlockBackends or for a
      monitor-owned block driver tree, which would only complicate things.
      
      The much easier and more obviously correct way is to fundamentally
      change the way the functions work: Iterate over all BlockDriverStates,
      no matter who owns them, and drain them individually. Compensation is
      only necessary when a new BDS is created inside a drain_all section.
      Removal of a BDS doesn't require any action because it's gone afterwards
      anyway.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0f12264e
    • K
      block: ignore_bds_parents parameter for drain functions · 6cd5c9d7
      Kevin Wolf 提交于
      In the future, bdrv_drained_all_begin/end() will drain all invidiual
      nodes separately rather than whole subtrees. This means that we don't
      want to propagate the drain to all parents any more: If the parent is a
      BDS, it will already be drained separately. Recursing to all parents is
      unnecessary work and would make it an O(n²) operation.
      
      Prepare the drain function for the changed drain_all by adding an
      ignore_bds_parents parameter to the internal implementation that
      prevents the propagation of the drain to BDS parents. We still (have to)
      propagate it to non-BDS parents like BlockBackends or Jobs because those
      are not drained separately.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6cd5c9d7
    • K
      block: Really pause block jobs on drain · 89bd0305
      Kevin Wolf 提交于
      We already requested that block jobs be paused in .bdrv_drained_begin,
      but no guarantee was made that the job was actually inactive at the
      point where bdrv_drained_begin() returned.
      
      This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
      uses it to make bdrv_drain_poll() consider block jobs using the node to
      be drained.
      
      For the test case to work as expected, we have to switch from
      block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
      considered active and must be waited for when draining the node.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      89bd0305
  11. 11 6月, 2018 1 次提交
  12. 01 6月, 2018 1 次提交
  13. 23 5月, 2018 1 次提交
  14. 15 5月, 2018 4 次提交
    • M
      block: Document BDRV_REQ_WRITE_UNCHANGED support · c1e3489d
      Max Reitz 提交于
      Add BDRV_REQ_WRITE_UNCHANGED to the list of flags honored during pwrite
      and pwrite_zeroes, and also add a note on when you absolutely need to
      support it.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180502140359.18222-1-mreitz@redhat.com
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      c1e3489d
    • E
      block: Merge .bdrv_co_writev{,_flags} in drivers · e18a58b4
      Eric Blake 提交于
      We have too many driver callback interfaces; simplify the mess
      somewhat by merging the flags parameter of .bdrv_co_writev_flags()
      into .bdrv_co_writev().  Note that as long as a driver doesn't set
      .supported_write_flags, the flags argument will be 0 and behavior is
      identical.  Also note that the public function bdrv_co_writev() still
      lacks a flags argument; so the driver signature is thus intentionally
      slightly different.  But that's not the end of the world, nor the first
      time that the driver interface differs slightly from the public
      interface.
      
      Ideally, we should be rewriting all of these drivers to use modern
      byte-based interfaces.  But that's a more invasive patch to write
      and audit, compared to the simplification done here.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NDaniel P. Berrangé <berrange@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e18a58b4
    • E
      block: Drop last of the sector-based aio callbacks · edfab6a0
      Eric Blake 提交于
      We are gradually moving away from sector-based interfaces, towards
      byte-based.  Now that all drivers with aio callbacks are using the
      byte-based interfaces, we can remove the sector-based versions.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      edfab6a0
    • E
      block: Support byte-based aio callbacks · e31f6864
      Eric Blake 提交于
      We are gradually moving away from sector-based interfaces, towards
      byte-based.  Add new sector-based aio callbacks for read and write,
      to match the fact that bdrv_aio_pdiscard is already byte-based.
      
      Ideally, drivers should be converted to use coroutine callbacks
      rather than aio; but that is not quite as trivial (and if we were
      to do that conversion, the null-aio driver would disappear), so for
      the short term, converting the signature but keeping things with
      aio is easier.  However, we CAN declare that a driver that uses
      the byte-based aio interfaces now defaults to byte-based
      operations, and must explicitly provide a refresh_limits override
      to stick with larger alignments (making the alignment issues more
      obvious directly in the drivers touched in the next few patches).
      
      Once all drivers are converted, the sector-based aio callbacks will
      be removed; in the meantime, a FIXME comment is added due to a
      slight inefficiency that will be touched up as part of that later
      cleanup.
      
      Simplify some instances of 'bs->drv' into 'drv' while touching this,
      since the local variable already exists to reduce typing.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e31f6864
  15. 26 3月, 2018 1 次提交
  16. 09 3月, 2018 3 次提交
  17. 03 3月, 2018 2 次提交
    • S
      block: rename .bdrv_create() to .bdrv_co_create_opts() · efc75e2a
      Stefan Hajnoczi 提交于
      BlockDriver->bdrv_create() has been called from coroutine context since
      commit 5b7e1542 ("block: make
      bdrv_create adopt coroutine").
      
      Make this explicit by renaming to .bdrv_co_create_opts() and add the
      coroutine_fn annotation.  This makes it obvious to block driver authors
      that they may yield, use CoMutex, or other coroutine_fn APIs.
      bdrv_co_create is reserved for the QAPI-based version that Kevin is
      working on.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Message-Id: <20170705102231.20711-2-stefanha@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      efc75e2a
    • S
      block: extract AIO_WAIT_WHILE() from BlockDriverState · 7719f3c9
      Stefan Hajnoczi 提交于
      BlockDriverState has the BDRV_POLL_WHILE() macro to wait on event loop
      activity while a condition evaluates to true.  This is used to implement
      synchronous operations where it acts as a condvar between the IOThread
      running the operation and the main loop waiting for the operation.  It
      can also be called from the thread that owns the AioContext and in that
      case it's just a nested event loop.
      
      BlockBackend needs this behavior but doesn't always have a
      BlockDriverState it can use.  This patch extracts BDRV_POLL_WHILE() into
      the AioWait abstraction, which can be used with AioContext and isn't
      tied to BlockDriverState anymore.
      
      This feature could be built directly into AioContext but then all users
      would kick the event loop even if they signal different conditions.
      Imagine an AioContext with many BlockDriverStates, each time a request
      completes any waiter would wake up and re-check their condition.  It's
      nicer to keep a separate AioWait object for each condition instead.
      
      Please see "block/aio-wait.h" for details on the API.
      
      The name AIO_WAIT_WHILE() avoids the confusion between AIO_POLL_WHILE()
      and AioContext polling.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      7719f3c9