提交 · f3ede4b05d1407d48314f0edada5e402865e641e · openeuler / qemu

31 10月, 2016 2 次提交

block: Block all intermediate nodes in commit_active_start() · f3ede4b0

由 Alberto Garcia 提交于 10月 28, 2016

When block-commit is launched without the top parameter, it uses
internally a mirror block job. In that case all intermediate nodes
between the active and base nodes must be blocked as well.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f3ede4b0

block: Use block_job_add_bdrv() in mirror_start_job() · cee3c6b5

由 Alberto Garcia 提交于 10月 28, 2016

Use block_job_add_bdrv() instead of blocking all operations in
mirror_start_job() and unblocking them in mirror_exit().
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cee3c6b5

28 10月, 2016 2 次提交

mirror: use bdrv_drained_begin/bdrv_drained_end · 9a0cec66

由 Paolo Bonzini 提交于 10月 27, 2016

Ensure that there are no changes between the last check to
bdrv_get_dirty_count and the switch to the target.

There is already a bdrv_drained_end call, we only need to ensure
that bdrv_drained_begin is not called twice.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-4-git-send-email-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

9a0cec66

blockjob: introduce .drain callback for jobs · bae8196d

由 Paolo Bonzini 提交于 10月 27, 2016

This is required to decouple block jobs from running in an
AioContext.  With multiqueue block devices, a BlockDriverState
does not really belong to a single AioContext.

The solution is to first wait until all I/O operations are
complete; then loop in the main thread for the block job to
complete entirely.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-Id: <1477565348-5458-3-git-send-email-pbonzini@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>

bae8196d

24 10月, 2016 1 次提交

block: Hide HBitmap in block dirty bitmap interface · dc162c8e

由 Fam Zheng 提交于 10月 13, 2016

HBitmap is an implementation detail of block dirty bitmap that should be hidden
from users. Introduce a BdrvDirtyBitmapIter to encapsulate the underlying
HBitmapIter.

A small difference in the interface is, before, an HBitmapIter is initialized
in place, now the new BdrvDirtyBitmapIter must be dynamically allocated because
the structure definition is in block/dirty-bitmap.c.

Two current users are converted too.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Message-id: 1476395910-8697-2-git-send-email-jsnow@redhat.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

dc162c8e

13 9月, 2016 1 次提交

mirror: auto complete active commit · b49f7ead

由 Wen Congyang 提交于 7月 27, 2016

Auto complete mirror job in background to prevent from
blocking synchronously
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Signed-off-by: NWang WeiWei <wangww.fnst@cn.fujitsu.com>
Message-id: 1469602913-20979-7-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

b49f7ead

08 8月, 2016 1 次提交

mirror: finish earlier on error · dbaa7b57

由 Vladimir Sementsov-Ogievskiy 提交于 8月 03, 2016

Stop to produce new async copy requests from mirror_iteration if
critical error (error action = BLOCK_ERROR_ACTION_REPORT) detected.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

dbaa7b57

27 7月, 2016 1 次提交

mirror: double performance of the bulk stage if the disc is full · 0965a41e

由 Vladimir Sementsov-Ogievskiy 提交于 7月 14, 2016

Mirror can do up to 16 in-flight requests, but actually on full copy
(the whole source disk is non-zero) in-flight is always 1. This happens
as the request is not limited in size: the data occupies maximum available
capacity of s->buf.

The patch limits the size of the request to some artificial constant
(1 Mb here), which is not that big or small. This effectively enables
back parallelism in mirror code as it was designed.

The result is important: the time to migrate 10 Gb disk is reduced from
~350 sec to 170 sec.
Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-id: 1468516741-82174-1-git-send-email-vsementsov@virtuozzo.com
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

0965a41e

22 7月, 2016 1 次提交

Revert "mirror: Workaround for unexpected iohandler events during completion" · d4a92a84

由 Fam Zheng 提交于 7月 13, 2016

This reverts commit ab27c3b5.

The virtio storage device host notifiers now work with
bdrv_drained_begin/end, so we don't need this hack any more.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

d4a92a84

20 7月, 2016 8 次提交

block: Convert BB interface to byte-based discards · 1c6c4bb7

由 Eric Blake 提交于 7月 15, 2016

Change sector-based blk_discard(), blk_co_discard(), and
blk_aio_discard() to instead be byte-based blk_pdiscard(),
blk_co_pdiscard(), and blk_aio_pdiscard().  NBD gets a lot
simpler now that ignoring the unaligned portion of a
byte-based discard request is handled under the hood by
the block layer.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-6-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

1c6c4bb7

mirror: fix request throttling in drive-mirror · cf56a3c6

由 Denis V. Lunev 提交于 6月 22, 2016

There are 2 deficiencies here:
- mirror_iteration could start several requests inside. Thus we could
  simply have more in_flight requests than MAX_IN_FLIGHT.
- keeping this in mind throttling in mirror_run which is checking
  s->in_flight == MAX_IN_FLIGHT is wrong.

The patch adds the check and throttling into mirror_iteration and fixes
the check in mirror_run() to be sure.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Message-id: 1466598927-5990-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>
(cherry picked from commit e648dc95c28fbca12e67be26a1fc4b9a0676c3fe)

cf56a3c6

mirror: improve performance of mirroring of empty disk · 4b5004d9

由 Denis V. Lunev 提交于 7月 14, 2016

We should not take into account zero blocks for delay calculations.
They are not read and thus IO throttling is not required. In the
other case VM migration with 16 Tb QCOW2 disk with 4 Gb of data takes
days.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-9-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

4b5004d9

mirror: efficiently zero out target · c7c2769c

由 Denis V. Lunev 提交于 7月 14, 2016

With a bdrv_co_write_zeroes method on a target BDS and when this method
is working as indicated by the bdrv_can_write_zeroes_with_unmap(), zeroes
will not be placed into the wire. Thus the target could be very efficiently
zeroed out. This should be done with the largest chunk possible.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-8-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

c7c2769c

mirror: optimize dirty bitmap filling in mirror_run a bit · b7d5062c

由 Denis V. Lunev 提交于 7月 14, 2016

There is no need to scan allocation tables if we have mark_all_dirty flag
set. Just mark it all dirty.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-7-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

b7d5062c

mirror: create mirror_dirty_init helper for mirror_run · c0b363ad

由 Denis V. Lunev 提交于 7月 14, 2016

The code inside the helper will be extended in the next patch. mirror_run
itself is overbloated at the moment.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-5-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
CC: Eric Blake <eblake@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

c0b363ad

mirror: create mirror_throttle helper · 49efb1f5

由 Denis V. Lunev 提交于 7月 14, 2016

The patch also places last_pause_ns from stack in mirror_run into
MirrorBlockJob structure. This helper will be useful in next patches.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-id: 1468503209-19498-4-git-send-email-den@openvz.org
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Eric Blake <eblake@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

49efb1f5

mirror: make sectors_in_flight int64_t · 531509ba

由 Denis V. Lunev 提交于 7月 14, 2016

We keep here the sum of int fields. Thus this could easily overflow,
especially when we will start sending big requests in next patches.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-3-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

531509ba

13 7月, 2016 6 次提交

Improve block job rate limiting for small bandwidth values · f14a39cc

由 Sascha Silbe 提交于 6月 28, 2016

ratelimit_calculate_delay() previously reset the accounting every time
slice, no matter how much data had been processed before. This had (at
least) two consequences:

1. The minimum speed is rather large, e.g. 5 MiB/s for commit and stream.

   Not sure if there are real-world use cases where this would be a
   problem. Mirroring and backup over a slow link (e.g. DSL) would
   come to mind, though.

2. Tests for block job operations (e.g. cancel) were rather racy

   All block jobs currently use a time slice of 100ms. That's a
   reasonable value to get smooth output during regular
   operation. However this also meant that the state of block jobs
   changed every 100ms, no matter how low the configured limit was. On
   busy hosts, qemu often transferred additional chunks until the test
   case had a chance to cancel the job.

Fix the block job rate limit code to delay for more than one time
slice to address the above issues. To make it easier to handle
oversized chunks we switch the semantics from returning a delay
_before_ the current request to a delay _after_ the current
request. If necessary, this delay consists of multiple time slice
units.

Since the mirror job sends multiple chunks in one go even if the rate
limit was exceeded in between, we need to keep track of the start of
the current time slice so we can correctly re-compute the delay for
the updated amount of data.

The minimum bandwidth now is 1 data unit per time slice. The block
jobs are currently passing the amount of data transferred in sectors
and using 100ms time slices, so this translates to 5120
bytes/second. With chunk sizes usually being O(512KiB), tests have
plenty of time (O(100s)) to operate on block jobs. The chance of a
race condition now is fairly remote, except possibly on insanely
loaded systems.
Signed-off-by: NSascha Silbe <silbe@linux.vnet.ibm.com>
Message-id: 1467127721-9564-2-git-send-email-silbe@linux.vnet.ibm.com
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

f14a39cc

coroutine: move entry argument to qemu_coroutine_create · 0b8b8753

由 Paolo Bonzini 提交于 7月 04, 2016

In practice the entry argument is always known at creation time, and
it is confusing that sometimes qemu_coroutine_enter is used with a
non-NULL argument to re-enter a coroutine (this happens in
block/sheepdog.c and tests/test-coroutine.c).  So pass the opaque value
at creation time, for consistency with e.g. aio_bh_new.

Mostly done with the following semantic patch:

@ entry1 @
expression entry, arg, co;
@@
- co = qemu_coroutine_create(entry);
+ co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry2 @
expression entry, arg;
identifier co;
@@
- Coroutine *co = qemu_coroutine_create(entry);
+ Coroutine *co = qemu_coroutine_create(entry, arg);
  ...
- qemu_coroutine_enter(co, arg);
+ qemu_coroutine_enter(co);

@ entry3 @
expression entry, arg;
@@
- qemu_coroutine_enter(qemu_coroutine_create(entry), arg);
+ qemu_coroutine_enter(qemu_coroutine_create(entry, arg));

@ reentry @
expression co;
@@
- qemu_coroutine_enter(co, NULL);
+ qemu_coroutine_enter(co);

except for the aforementioned few places where the semantic patch
stumbled (as expected) and for test_co_queue, which would otherwise
produce an uninitialized variable warning.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

0b8b8753

commit: Add 'job-id' parameter to 'block-commit' · fd62c609

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'block-commit',
allowing the user to specify the ID of the block job to be created.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

fd62c609

mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' · 71aa9867

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_mirror' command remains unchanged.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

71aa9867

blockjob: Add 'job_id' parameter to block_job_create() · 7f0317cf

由 Alberto Garcia 提交于 7月 05, 2016

When a new job is created, the job ID is taken from the device name of
the BDS. This patch adds a new 'job_id' parameter to let the caller
provide one instead.

This patch also verifies that the ID is always unique and well-formed.
This causes problems in a couple of places where no ID is being set,
because the BDS does not have a device name.

In the case of test_block_job_start() (from test-blockjob-txn.c) we
can simply use this new 'job_id' parameter to set the missing ID.

In the case of img_commit() (from qemu-img.c) we still don't have the
API to make commit_active_start() set the job ID, so we solve it by
setting a default value. We'll get rid of this as soon as we extend
the API.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

7f0317cf

blockjob: Update description of the 'id' field · 9df229c3

由 Alberto Garcia 提交于 7月 05, 2016

The 'id' field of the BlockJob structure will be able to hold any ID,
not only a device name. This patch updates the description of that
field and the error messages where it is being used.

Soon we'll add the ability to set an arbitrary ID when creating a
block job.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

9df229c3

29 6月, 2016 4 次提交

mirror: fix misleading comments · 15d67298

由 Changlong Xie 提交于 6月 23, 2016

s/target bs/to_replace/, also we check to_replace bs is not
blocked in qmp_drive_mirror() not here
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-id: 1466672241-22485-3-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

15d67298

mirror: limit niov to IOV_MAX elements, again · e4808881

由 John Snow 提交于 6月 22, 2016

During the refactor of mirror_iteration in e5b43573,
we regressed the fix introduced in cae98cb8.

This patch re-adds IOV_MAX checking to cases where we
aren't checking alignment (and size) already.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1466625064-11280-3-git-send-email-jsnow@redhat.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

e4808881

mirror: clarify mirror_do_read return code · 17612955

由 John Snow 提交于 6月 22, 2016

mirror_do_read intends to return the number of sectors processed after
the starting sector, without regard to how many sectors were processed
before the starting sector due to alignment.

Clean up the comments and code to hopefully illustrate this more clearly.

This also fixes an issue in initialization where if the mirror buffer size
is initialized to smaller than the number of sectors being requested for
transfer, we report back an incorrectly large number to the caller.
Signed-off-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1466625064-11280-2-git-send-email-jsnow@redhat.com
Signed-off-by: NJeff Cody <jcody@redhat.com>

17612955

mirror: fix trace_mirror_yield_in_flight usage in mirror_iteration() · ff04198b

由 Denis V. Lunev 提交于 6月 21, 2016

trace_mirror_yield_in_flight accepts 2nd arguments in sectors while here
we pass chunks instead.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-id: 1466518157-27140-1-git-send-email-den@openvz.org
CC: Jeff Cody <jcody@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

ff04198b

20 6月, 2016 1 次提交

mirror: follow AioContext change gracefully · 565ac01f

由 Stefan Hajnoczi 提交于 6月 16, 2016

Add block_job_pause_point() calls to mark quiescent points and make sure
to complete in-flight requests when switching AioContexts.

This patch solves undefined behavior in the mirror block job when the
BDS AioContext is changed by dataplane.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-8-git-send-email-stefanha@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

565ac01f

16 6月, 2016 3 次提交

block/mirror: Fix target backing BDS · 274fccee

由 Max Reitz 提交于 6月 10, 2016

Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

274fccee

block: Byte-based bdrv_co_do_copy_on_readv() · 244483e6

由 Kevin Wolf 提交于 6月 02, 2016

In a first step to convert the common I/O path to work on bytes rather
than sectors, this converts the copy-on-read logic that is used by
bdrv_aligned_preadv().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

244483e6

block: Avoid bogus flags during mirroring · 73698c30

由 Eric Blake 提交于 6月 13, 2016

Commit e253f4b8 converted mirroring from sector-based bdrv_aio_*
to byte-based blk_aio_*, but failed to account for the subtle
difference in signatures (the former takes a semi-redundant length,
the latter takes a flags parameter).  Since all of our flags are
currently smaller in size than BDRV_SECTOR_SIZE, it has no ill
effects until we either perform sub-sector mirroring, or we start
asserting that no unexpected flags are set.  I found it while
testing new asserts when qemu-iotests 132 started warning about an
unknown flag 0x200000.
Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

73698c30

26 5月, 2016 3 次提交

mirror: Use BlockBackend for I/O · e253f4b8

由 Kevin Wolf 提交于 4月 12, 2016

This changes the mirror block job to use the job's BlockBackend for
performing its I/O. job->bs isn't used by the mirroring code any more
afterwards.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

e253f4b8

mirror: Allow target that already has a BlockBackend · b8804815

由 Kevin Wolf 提交于 4月 12, 2016

We had to forbid mirroring to a target BDS that already had a BB
attached because the node swapping at job completion would add a second
BB and we didn't support multiple BBs on a single BDS at the time. Now
we do, so we can lift the restriction.

As we allow additional BlockBackends for the target, we must expect
other users to be sending requests. There may no requests be in flight
during the graph modification, so we have to drain those users now.

The core part of this patch is a revert of commit 40365552.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

b8804815

block: Convert block job core to BlockBackend · b6d2e599

由 Kevin Wolf 提交于 4月 08, 2016

This adds a new BlockBackend field to the BlockJob struct, which
coexists with the BlockDriverState while converting the individual jobs.

When creating a block job, a new BlockBackend is created on top of the
given BlockDriverState, and it is destroyed when the BlockJob ends. The
reference to the BDS is now held by the BlockBackend instead of calling
bdrv_ref/unref manually.

We have to be careful when we use bdrv_replace_in_backing_chain() in
block jobs because this changes the BDS that job->blk points to. At the
moment block jobs are too tightly coupled with their BDS, so that moving
a job to another BDS isn't easily possible; therefore, we need to just
manually undo this change afterwards.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

b6d2e599

19 5月, 2016 3 次提交

block: Remove BlockDriverState.blk · 1f0c461b

由 Kevin Wolf 提交于 3月 22, 2016

This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

1f0c461b

blockjob: Don't touch BDS iostatus · 66a0fae4

由 Kevin Wolf 提交于 4月 18, 2016

Block jobs don't actually make use of the iostatus for their BDSes, but
they manage a separate block job iostatus. Still, they require that it
is enabled for the source BDS and they enable it automatically for the
target and set the error handling mode - which ends up never being used
by the job.

This patch removes all of the BDS iostatus handling from the block job,
which removes another few bs->blk accesses.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

66a0fae4

blockjob: Don't set iostatus of target · 81e254dc

由 Kevin Wolf 提交于 4月 18, 2016

When block job errors were introduced, we assigned the iostatus of the
target BDS "just in case". The field has never been accessible for the
user because the target isn't listed in query-block.

Before we can allow the user to have a second BlockBackend on the
target, we need to clean this up. If anything, we would want to set the
iostatus for the internal BB of the job (which we can always do later),
but certainly not for a separate BB which the job doesn't even use.

As a nice side effect, this gets us rid of another bs->blk use.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

81e254dc

22 4月, 2016 1 次提交

mirror: Workaround for unexpected iohandler events during completion · ab27c3b5

由 Fam Zheng 提交于 4月 22, 2016

Commit 5a7e7a0b moved mirror_exit to a BH handler but didn't add any
protection against new requests that could sneak in just before the
BH is dispatched. For example (assuming a code base at that commit):

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch
                aio_dispatch
                  ...
                    mirror_run
                      bdrv_drain
    (a)               block_job_defer_to_main_loop
          qemu_iohandler_poll
            virtio_queue_host_notifier_read
              ...
                virtio_submit_multiwrite
    (b)           blk_aio_multiwrite

        main_loop_wait # 2
          <snip>
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

At (a) we know the BDS has no pending request. However, the same
main_loop_wait call is going to dispatch iohandlers (EventNotifier
events), which may lead to a new I/O from guest. So the invariant is
already broken at (c). Data loss.

Commit f3926945 made iohandler to use aio API.  The order of
virtio_queue_host_notifier_read and block_job_defer_to_main_loop within
a main_loop_wait becomes unpredictable, and even worse, if the host
notifier event arrives at the next main_loop_wait call, the
unpredictable order between mirror_exit and
virtio_queue_host_notifier_read is also a trouble. As shown below, this
commit made the bug easier to trigger:

    - Bug case 1:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop
              aio_ctx_dispatch (iohandler_ctx)
                virtio_queue_host_notifier_read
                  ...
                    virtio_submit_multiwrite
    (b)               blk_aio_multiwrite

        main_loop_wait # 2
          ...
                aio_dispatch
                  aio_bh_poll
    (c)             mirror_exit

    - Bug case 2:

        main_loop_wait # 1
          os_host_main_loop_wait
            g_main_context_dispatch
              aio_ctx_dispatch (qemu_aio_context)
                ...
                  mirror_run
                    bdrv_drain
    (a)             block_job_defer_to_main_loop

        main_loop_wait # 2
          ...
            aio_ctx_dispatch (iohandler_ctx)
              virtio_queue_host_notifier_read
                ...
                  virtio_submit_multiwrite
    (b)             blk_aio_multiwrite
              aio_dispatch
                aio_bh_poll
    (c)           mirror_exit

In both cases, (b) breaks the invariant wanted by (a) and (c).

Until then, the request loss has been silent. Later, 3f09bfbc added
asserts at (c) to check the invariant (in
bdrv_replace_in_backing_chain), and Max reported an assertion failure
first visible there, by doing active committing while the guest is
running bonnie++.

2.5 added bdrv_drained_begin at (a) to protect the dataplane case from
similar problems, but we never realize the main loop bug until now.

As a bandage, this patch disables iohandler's external events
temporarily together with bs->ctx.

Launchpad Bug: 1570134

Cc: qemu-stable@nongnu.org
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

ab27c3b5

20 4月, 2016 2 次提交

mirror: Don't extend the last sub-chunk · 4150ae60

由 Fam Zheng 提交于 4月 20, 2016

The last sub-chunk is rounded up to the copy granularity in the target
image, resulting in a larger size than the source.

Add a function to clip the copied sectors to the end.

This undoes the "wrong" changes to tests/qemu-iotests/109.out in
e5b43573. The remaining two offset changes are okay.

[ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ]
Reported-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>

4150ae60

block/mirror: Refresh stale bitmap iterator cache · f27a2742

由 Max Reitz 提交于 4月 20, 2016

If the drive's dirty bitmap is dirtied while the mirror operation is
running, the cache of the iterator used by the mirror code may become
stale and not contain all dirty bits.

This only becomes an issue if we are looking for contiguously dirty
chunks on the drive. In that case, we can easily detect the discrepancy
and just refresh the iterator if one occurs.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f27a2742