1. 18 6月, 2018 21 次提交
    • G
      block: fix QEMU crash with scsi-hd and drive_del · f45280cb
      Greg Kurz 提交于
      Removing a drive with drive_del while it is being used to run an I/O
      intensive workload can cause QEMU to crash.
      
      An AIO flush can yield at some point:
      
      blk_aio_flush_entry()
       blk_co_flush(blk)
        bdrv_co_flush(blk->root->bs)
         ...
          qemu_coroutine_yield()
      
      and let the HMP command to run, free blk->root and give control
      back to the AIO flush:
      
          hmp_drive_del()
           blk_remove_bs()
            bdrv_root_unref_child(blk->root)
             child_bs = blk->root->bs
             bdrv_detach_child(blk->root)
              bdrv_replace_child(blk->root, NULL)
               blk->root->bs = NULL
              g_free(blk->root) <============== blk->root becomes stale
             bdrv_unref(child_bs)
              bdrv_delete(child_bs)
               bdrv_close()
                bdrv_drained_begin()
                 bdrv_do_drained_begin()
                  bdrv_drain_recurse()
                   aio_poll()
                    ...
                    qemu_coroutine_switch()
      
      and the AIO flush completion ends up dereferencing blk->root:
      
        blk_aio_complete()
         scsi_aio_complete()
          blk_get_aio_context(blk)
           bs = blk_bs(blk)
       ie, bs = blk->root ? blk->root->bs : NULL
                  ^^^^^
                  stale
      
      The problem is that we should avoid making block driver graph
      changes while we have in-flight requests. Let's drain all I/O
      for this BB before calling bdrv_root_unref_child().
      Signed-off-by: NGreg Kurz <groug@kaod.org>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f45280cb
    • K
      test-bdrv-drain: Test graph changes in drain_all section · 19f7a7e5
      Kevin Wolf 提交于
      This tests both adding and remove a node between bdrv_drain_all_begin()
      and bdrv_drain_all_end(), and enabled the existing detach test for
      drain_all.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      19f7a7e5
    • K
      block: Allow graph changes in bdrv_drain_all_begin/end sections · 0f12264e
      Kevin Wolf 提交于
      bdrv_drain_all_*() used bdrv_next() to iterate over all root nodes and
      did a subtree drain for each of them. This works fine as long as the
      graph is static, but sadly, reality looks different.
      
      If the graph changes so that root nodes are added or removed, we would
      have to compensate for this. bdrv_next() returns each root node only
      once even if it's the root node for multiple BlockBackends or for a
      monitor-owned block driver tree, which would only complicate things.
      
      The much easier and more obviously correct way is to fundamentally
      change the way the functions work: Iterate over all BlockDriverStates,
      no matter who owns them, and drain them individually. Compensation is
      only necessary when a new BDS is created inside a drain_all section.
      Removal of a BDS doesn't require any action because it's gone afterwards
      anyway.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0f12264e
    • K
      block: ignore_bds_parents parameter for drain functions · 6cd5c9d7
      Kevin Wolf 提交于
      In the future, bdrv_drained_all_begin/end() will drain all invidiual
      nodes separately rather than whole subtrees. This means that we don't
      want to propagate the drain to all parents any more: If the parent is a
      BDS, it will already be drained separately. Recursing to all parents is
      unnecessary work and would make it an O(n²) operation.
      
      Prepare the drain function for the changed drain_all by adding an
      ignore_bds_parents parameter to the internal implementation that
      prevents the propagation of the drain to BDS parents. We still (have to)
      propagate it to non-BDS parents like BlockBackends or Jobs because those
      are not drained separately.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      6cd5c9d7
    • K
      block: Move bdrv_drain_all_begin() out of coroutine context · c8ca33d0
      Kevin Wolf 提交于
      Before we can introduce a single polling loop for all nodes in
      bdrv_drain_all_begin(), we must make sure to run it outside of coroutine
      context like we already do for bdrv_do_drained_begin().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c8ca33d0
    • K
      block: Allow AIO_WAIT_WHILE with NULL ctx · 4d22bbf4
      Kevin Wolf 提交于
      bdrv_drain_all() wants to have a single polling loop for draining the
      in-flight requests of all nodes. This means that the AIO_WAIT_WHILE()
      condition relies on activity in multiple AioContexts, which is polled
      from the mainloop context. We must therefore call AIO_WAIT_WHILE() from
      the mainloop thread and use the AioWait notification mechanism.
      
      Just randomly picking the AioContext of any non-mainloop thread would
      work, but instead of bothering to find such a context in the caller, we
      can just as well accept NULL for ctx.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4d22bbf4
    • K
      test-bdrv-drain: Test that bdrv_drain_invoke() doesn't poll · 57320ca9
      Kevin Wolf 提交于
      This adds a test case that goes wrong if bdrv_drain_invoke() calls
      aio_poll().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      57320ca9
    • K
      block: Defer .bdrv_drain_begin callback to polling phase · 0109e7e6
      Kevin Wolf 提交于
      We cannot allow aio_poll() in bdrv_drain_invoke(begin=true) until we're
      done with propagating the drain through the graph and are doing the
      single final BDRV_POLL_WHILE().
      
      Just schedule the coroutine with the callback and increase bs->in_flight
      to make sure that the polling phase will wait for it.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0109e7e6
    • K
      test-bdrv-drain: Graph change through parent callback · 231281ab
      Kevin Wolf 提交于
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      231281ab
    • K
      block: Don't poll in parent drain callbacks · dcf94a23
      Kevin Wolf 提交于
      bdrv_do_drained_begin() is only safe if we have a single
      BDRV_POLL_WHILE() after quiescing all affected nodes. We cannot allow
      that parent callbacks introduce a nested polling loop that could cause
      graph changes while we're traversing the graph.
      
      Split off bdrv_do_drained_begin_quiesce(), which only quiesces a single
      node without waiting for its requests to complete. These requests will
      be waited for in the BDRV_POLL_WHILE() call down the call chain.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      dcf94a23
    • K
      test-bdrv-drain: Test node deletion in subtree recursion · ebd31837
      Kevin Wolf 提交于
      If bdrv_do_drained_begin() polls during its subtree recursion, the graph
      can change and mess up the bs->children iteration. Test that this
      doesn't happen.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ebd31837
    • K
      block: Drain recursively with a single BDRV_POLL_WHILE() · fe4f0614
      Kevin Wolf 提交于
      Anything can happen inside BDRV_POLL_WHILE(), including graph
      changes that may interfere with its callers (e.g. child list iteration
      in recursive callers of bdrv_do_drained_begin).
      
      Switch to a single BDRV_POLL_WHILE() call for the whole subtree at the
      end of bdrv_do_drained_begin() to avoid such effects. The recursion
      happens now inside the loop condition. As the graph can only change
      between bdrv_drain_poll() calls, but not inside of it, doing the
      recursion here is safe.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      fe4f0614
    • M
      test-bdrv-drain: Add test for node deletion · 4c8158e3
      Max Reitz 提交于
      This patch adds two bdrv-drain tests for what happens if some BDS goes
      away during the drainage.
      
      The basic idea is that you have a parent BDS with some child nodes.
      Then, you drain one of the children.  Because of that, the party who
      actually owns the parent decides to (A) delete it, or (B) detach all its
      children from it -- both while the child is still being drained.
      
      A real-world case where this can happen is the mirror block job, which
      may exit if you drain one of its children.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      4c8158e3
    • K
      block: Remove bdrv_drain_recurse() · d30b8e64
      Kevin Wolf 提交于
      For bdrv_drain(), recursively waiting for child node requests is
      pointless because we didn't quiesce their parents, so new requests could
      come in anyway. Letting the function work only on a single node makes it
      more consistent.
      
      For subtree drains and drain_all, we already have the recursion in
      bdrv_do_drained_begin(), so the extra recursion doesn't add anything
      either.
      
      Remove the useless code.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      d30b8e64
    • K
      block: Really pause block jobs on drain · 89bd0305
      Kevin Wolf 提交于
      We already requested that block jobs be paused in .bdrv_drained_begin,
      but no guarantee was made that the job was actually inactive at the
      point where bdrv_drained_begin() returned.
      
      This introduces a new callback BdrvChildRole.bdrv_drained_poll() and
      uses it to make bdrv_drain_poll() consider block jobs using the node to
      be drained.
      
      For the test case to work as expected, we have to switch from
      block_job_sleep_ns() to qemu_co_sleep_ns() so that the test job is even
      considered active and must be waited for when draining the node.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      89bd0305
    • K
      block: Avoid unnecessary aio_poll() in AIO_WAIT_WHILE() · 1cc8e54a
      Kevin Wolf 提交于
      Commit 91af091f added an additional aio_poll() to BDRV_POLL_WHILE()
      in order to make sure that all pending BHs are executed on drain. This
      was the wrong place to make the fix, as it is useless overhead for all
      other users of the macro and unnecessarily complicates the mechanism.
      
      This patch effectively reverts said commit (the context has changed a
      bit and the code has moved to AIO_WAIT_WHILE()) and instead polls in the
      loop condition for drain.
      
      The effect is probably hard to measure in any real-world use case
      because actual I/O will dominate, but if I run only the initialisation
      part of 'qemu-img convert' where it calls bdrv_block_status() for the
      whole image to find out how much data there is copy, this phase actually
      needs only roughly half the time after this patch.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      1cc8e54a
    • K
      tests/test-bdrv-drain: bdrv_drain_all() works in coroutines now · 6d0252f2
      Kevin Wolf 提交于
      Since we use bdrv_do_drained_begin/end() for bdrv_drain_all_begin/end(),
      coroutine context is automatically left with a BH, preventing the
      deadlocks that made bdrv_drain_all*() unsafe in coroutine context. Now
      that we even removed the old polling code as dead code, it's obvious
      that it's compatible now.
      
      Enable the coroutine test cases for bdrv_drain_all().
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      6d0252f2
    • K
      block: Don't manually poll in bdrv_drain_all() · c13ad59f
      Kevin Wolf 提交于
      All involved nodes are already idle, we called bdrv_do_drain_begin() on
      them.
      
      The comment in the code suggested that this was not correct because the
      completion of a request on one node could spawn a new request on a
      different node (which might have been drained before, so we wouldn't
      drain the new request). In reality, new requests to different nodes
      aren't spawned out of nothing, but only in the context of a parent
      request, and they aren't submitted to random nodes, but only to child
      nodes. As long as we still poll for the completion of the parent request
      (which we do), draining each root node separately is good enough.
      
      Remove the additional polling code from bdrv_drain_all_begin() and
      replace it with an assertion that all nodes are already idle after we
      drained them separately.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      c13ad59f
    • K
      block: Remove 'recursive' parameter from bdrv_drain_invoke() · 7d40d9ef
      Kevin Wolf 提交于
      All callers pass false for the 'recursive' parameter now. Remove it.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      7d40d9ef
    • K
      block: Use bdrv_do_drain_begin/end in bdrv_drain_all() · 79ab8b21
      Kevin Wolf 提交于
      bdrv_do_drain_begin/end() implement already everything that
      bdrv_drain_all_begin/end() need and currently still do manually: Disable
      external events, call parent drain callbacks, call block driver
      callbacks.
      
      It also does two more things:
      
      The first is incrementing bs->quiesce_counter. bdrv_drain_all() already
      stood out in the test case by behaving different from the other drain
      variants. Adding this is not only safe, but in fact a bug fix.
      
      The second is calling bdrv_drain_recurse(). We already do that later in
      the same function in a loop, so basically doing an early first iteration
      doesn't hurt.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      79ab8b21
    • K
      test-bdrv-drain: bdrv_drain() works with cross-AioContext events · bb675689
      Kevin Wolf 提交于
      As long as nobody keeps the other I/O thread from working, there is no
      reason why bdrv_drain() wouldn't work with cross-AioContext events. The
      key is that the root request we're waiting for is in the AioContext
      we're polling (which it always is for bdrv_drain()) so that aio_poll()
      is woken up in the end.
      
      Add a test case that shows that it works. Remove the comment in
      bdrv_drain() that claims otherwise.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      bb675689
  2. 16 6月, 2018 2 次提交
    • P
      Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180615a' into staging · 2ef2f167
      Peter Maydell 提交于
      Migration pull 2018-06-15
      
      # gpg: Signature made Fri 15 Jun 2018 16:13:17 BST
      # gpg:                using RSA key 0516331EBC5BFDE7
      # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
      # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7
      
      * remotes/dgilbert/tags/pull-migration-20180615a:
        migration: calculate expected_downtime with ram_bytes_remaining()
        migration/postcopy: Wake rate limit sleep on postcopy request
        migration: Wake rate limiting for urgent requests
        migration/postcopy: Add max-postcopy-bandwidth parameter
        migration: introduce migration_update_rates
        migration: fix counting xbzrle cache_miss_rate
        migration/block-dirty-bitmap: fix dirty_bitmap_load
        migration: Poison ramblock loops in migration
        migration: Fixes for non-migratable RAMBlocks
        typedefs: add QJSON
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      2ef2f167
    • P
      Merge remote-tracking branch... · 42747d6a
      Peter Maydell 提交于
      Merge remote-tracking branch 'remotes/edgar/tags/edgar/xilinx-next-2018-06-15.for-upstream' into staging
      
      xilinx-next-2018-06-15.for-upstream
      
      # gpg: Signature made Fri 15 Jun 2018 15:32:47 BST
      # gpg:                using RSA key 29C596780F6BCA83
      # gpg: Good signature from "Edgar E. Iglesias (Xilinx key) <edgar.iglesias@xilinx.com>"
      # gpg:                 aka "Edgar E. Iglesias <edgar.iglesias@gmail.com>"
      # Primary key fingerprint: AC44 FEDC 14F7 F1EB EDBF  4151 29C5 9678 0F6B CA83
      
      * remotes/edgar/tags/edgar/xilinx-next-2018-06-15.for-upstream:
        target-microblaze: Rework NOP/zero instruction handling
        target-microblaze: mmu: Correct masking of output addresses
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      42747d6a
  3. 15 6月, 2018 17 次提交