提交 · ce83ee57f6fa69280a0330f1b49ef75b0ef0742a · openeuler / qemu

18 8月, 2016 1 次提交

block: fix deadlock in bdrv_co_flush · ce83ee57

由 Evgeny Yakovlev 提交于 8月 17, 2016

The following commit
    commit 3ff2f67a
    Author: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
    Date:   Mon Jul 18 22:39:52 2016 +0300
    block: ignore flush requests when storage is clean
has introduced a regression.

There is a problem that it is still possible for 2 requests to execute
in non sequential fashion and sometimes this results in a deadlock
when bdrv_drain_one/all are called for BDS with such stalled requests.

1. Current flushed_gen and flush_started_gen is 1.
2. Request 1 enters bdrv_co_flush to with write_gen 1 (i.e. the same
   as flushed_gen). It gets past flushed_gen != flush_started_gen and
   sets flush_started_gen to 1 (again, the same it was before).
3. Request 1 yields somewhere before exiting bdrv_co_flush
4. Request 2 enters bdrv_co_flush with write_gen 2. It gets past
   flushed_gen != flush_started_gen and sets flush_started_gen to 2.
5. Request 2 runs to completion and sets flushed_gen to 2
6. Request 1 is resumed, runs to completion and sets flushed_gen to 1.
   However flush_started_gen is now 2.

From here on out flushed_gen is always != to flush_started_gen and all
further requests will wait on flush_queue. This change replaces
flush_started_gen with an explicitly tracked active flush request.
Signed-off-by: NEvgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Message-id: 1471457214-3994-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

ce83ee57

04 8月, 2016 1 次提交

block: Cater to iscsi with non-power-of-2 discard · b8d0a980

由 Eric Blake 提交于 7月 21, 2016

Dell Equallogic iSCSI SANs have a very unusual advertised geometry:

$ iscsi-inq -e 1 -c $((0xb0)) iscsi://XXX/0
wsnz:0
maximum compare and write length:1
optimal transfer length granularity:0
maximum transfer length:0
optimal transfer length:0
maximum prefetch xdread xdwrite transfer length:0
maximum unmap lba count:30720
maximum unmap block descriptor count:2
optimal unmap granularity:30720
ugavalid:1
unmap granularity alignment:0
maximum write same length:30720

which says that both the maximum and the optimal discard size
is 15M.  It is not immediately apparent if the device allows
discard requests not aligned to the optimal size, nor if it
allows discards at a finer granularity than the optimal size.

I tried to find details in the SCSI Commands Reference Manual
Rev. A on what valid values of maximum and optimal sizes are
permitted, but while that document mentions a "Block Limits
VPD Page", I couldn't actually find documentation of that page
or what values it would have, or if a SCSI device has an
advertisement of its minimal unmap granularity.  So it is not
obvious to me whether the Dell Equallogic device is compliance
with the SCSI specification.

Fortunately, it is easy enough to support non-power-of-2 sizing,
even if it means we are less efficient than truly possible when
targetting that device (for example, it means that we refuse to
unmap anything that is not a multiple of 15M and aligned to a
15M boundary, even if the device truly does support a smaller
granularity where unmapping actually works).
Reported-by: NPeter Lieven <pl@kamp.de>
Signed-off-by: NEric Blake <eblake@redhat.com>
Message-Id: <1469129688-22848-5-git-send-email-eblake@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8d0a980

20 7月, 2016 4 次提交

block: Kill .bdrv_co_discard() · 02aefe43

由 Eric Blake 提交于 7月 15, 2016

Now that all drivers have a byte-based .bdrv_co_pdiscard(), we
no longer need to worry about the sector-based version.  We can
also relax our minimum alignment to 1 for drivers that support it.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-18-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

02aefe43

block: Add .bdrv_co_pdiscard() driver callback · 47a5486d

由 Eric Blake 提交于 7月 15, 2016

There's enough drivers with a sector-based callback that it will
be easier to switch one at a time.  This patch adds a byte-based
callback, and then after all drivers are swapped, we'll drop the
sector-based callback.

[checkpatch doesn't like the space after coroutine_fn in
block_int.h, but it's consistent with the rest of the file]
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-10-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

47a5486d

block: Convert .bdrv_aio_discard() to byte-based · 4da444a0

由 Eric Blake 提交于 7月 15, 2016

Another step towards byte-based interfaces everywhere.  Replace
the sector-based driver callback .bdrv_aio_discard() with a new
byte-based .bdrv_aio_pdiscard().  Only raw-posix and RBD drivers
are affected, so it was not worth splitting into multiple patches.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468624988-423-9-git-send-email-eblake@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

4da444a0

dirty-bitmap: operate with int64_t amount · 6d078599

由 Denis V. Lunev 提交于 7月 14, 2016

Underlying HBitmap operates even with uint64_t. Thus this change is safe.
This would be useful f.e. to mark entire bitmap dirty in one call.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJohn Snow <jsnow@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1468503209-19498-2-git-send-email-den@openvz.org
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Jeff Cody <jcody@redhat.com>
Signed-off-by: NJeff Cody <jcody@redhat.com>

6d078599

19 7月, 2016 1 次提交

block: ignore flush requests when storage is clean · 3ff2f67a

由 Evgeny Yakovlev 提交于 7月 18, 2016

Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).
Signed-off-by: NEvgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: NJohn Snow <jsnow@redhat.com>

3ff2f67a

13 7月, 2016 5 次提交

commit: Add 'job-id' parameter to 'block-commit' · fd62c609

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'block-commit',
allowing the user to specify the ID of the block job to be created.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

fd62c609

stream: Add 'job-id' parameter to 'block-stream' · 2323322e

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'block-stream',
allowing the user to specify the ID of the block job to be created.

The HMP 'block_stream' command remains unchanged.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

2323322e

backup: Add 'job-id' parameter to 'blockdev-backup' and 'drive-backup' · 70559d49

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'blockdev-backup'
and 'drive-backup', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_backup' command remains unchanged.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

70559d49

mirror: Add 'job-id' parameter to 'blockdev-mirror' and 'drive-mirror' · 71aa9867

由 Alberto Garcia 提交于 7月 05, 2016

This patch adds a new optional 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror', allowing the user to specify the ID of the block
job to be created.

The HMP 'drive_mirror' command remains unchanged.
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

71aa9867

stream: Fix prototype of stream_start() · 29338003

由 Alberto Garcia 提交于 7月 05, 2016

'stream-start' has a parameter called 'backing-file', which is the
string to be written to bs->backing when the job finishes.

In the stream_start() implementation it is called 'backing_file_str',
but it the prototype in the header file it is called 'base_id'.

This patch fixes it so the name is the same in both cases and is
consistent with other cases (like commit_start()).
Signed-off-by: NAlberto Garcia <berto@igalia.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

29338003

05 7月, 2016 6 次提交

block: Convert bdrv_co_preadv/pwritev to BdrvChild · a03ef88f

由 Kevin Wolf 提交于 6月 20, 2016

This is the final patch for converting the common I/O path to take
a BdrvChild parameter instead of BlockDriverState.

The completion of this conversion means that all users that perform I/O
on an image need to actually hold a reference (in the form of BdrvChild,
possible as part of a BlockBackend) to that image. This also protects
against inconsistent use of BlockBackend vs. BlockDriverState functions
because direct use of a BlockDriverState isn't possible any more and
blk->root is private for block-backends.c.

In addition, we can now distinguish different users in the I/O path,
and the future op blockers work is going to add assertions based on
permissions stored in BdrvChild.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

a03ef88f

block: Use bool as appropriate for BDS members · 54115412

由 Eric Blake 提交于 6月 23, 2016

Using int for values that are only used as booleans is confusing.
While at it, rearrange a couple of members so that all the bools
are contiguous.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

54115412

block: Move request_alignment into BlockLimit · a5b8dd2c

由 Eric Blake 提交于 6月 23, 2016

It makes more sense to have ALL block size limit constraints
in the same struct.  Improve the documentation while at it.

Simplify a couple of conditionals, now that we have audited and
documented that request_alignment is always non-zero.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

a5b8dd2c

block: Switch discard length bounds to byte-based · b9f7855a

由 Eric Blake 提交于 6月 23, 2016

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_discard and
discard_alignment.  Rename them, using 'pdiscard' as an aid to
track which remaining discard interfaces need conversion, and so
that the compiler will help us catch the change in semantics
across any rebased code.  The BlockLimits type is now completely
byte-based; and in iscsi.c, sector_limits_lun2qemu() is no
longer needed.

pdiscard_alignment is made unsigned (we use power-of-2 alignments
as bitmasks, where unsigned is easier to think about) while
leaving max_pdiscard signed (since we still have an 'int'
interface); this is comparable to what commit cf081fca did for
write zeroes limits.  We may later want to make everything an
unsigned 64-bit limit - but that requires a bigger code audit.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

b9f7855a

block: Wording tweaks to write zeroes limits · 29cc6a68

由 Eric Blake 提交于 6月 23, 2016

Improve the documentation of the write zeroes limits, to mention
additional constraints that drivers should observe.  Worth squashing
into commit cf081fca, if that hadn't been pushed already :)
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

29cc6a68

block: Switch transfer length bounds to byte-based · 5def6b80

由 Eric Blake 提交于 6月 23, 2016

Sector-based limits are awkward to think about; in our on-going
quest to move to byte-based interfaces, convert max_transfer_length
and opt_transfer_length.  Rename them (dropping the _length suffix)
so that the compiler will help us catch the change in semantics
across any rebased code, and improve the documentation.  Use unsigned
values, so that we don't have to worry about negative values and
so that bit-twiddling is easier; however, we are still constrained
by 2^31 of signed int in most APIs.

When a value comes from an external source (iscsi and raw-posix),
sanitize the results to ensure that opt_transfer is a power of 2.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

5def6b80

20 6月, 2016 1 次提交

block: use safe iteration over AioContext notifiers · e8a095da

由 Stefan Hajnoczi 提交于 6月 16, 2016

It's possible that an AioContext notifier user was close to finishing
when .detach_aio_context() or .attached_aio_context() is called.  In
that case they may call bdrv_remove_aio_context_notifier() during the
callback.

Use safe iteration to avoid crashing when the notifier list is modified
during iteration.  We must not only handle the case where the current
aio notifier is removed during a callback but also the one where any
other aio notifier is removed.

The next patch adds an AioContext notifier for block jobs and they
really could be terminating just as .detach_aio_context() is invoked.
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Message-id: 1466096189-6477-6-git-send-email-stefanha@redhat.com

e8a095da

16 6月, 2016 4 次提交

block/mirror: Fix target backing BDS · 274fccee

由 Max Reitz 提交于 6月 10, 2016

Currently, we are trying to move the backing BDS from the source to the
target in bdrv_replace_in_backing_chain() which is called from
mirror_exit(). However, mirror_complete() already tries to open the
target's backing chain with a call to bdrv_open_backing_file().

First, we should only set the target's backing BDS once. Second, the
mirroring block job has a better idea of what to set it to than the
generic code in bdrv_replace_in_backing_chain() (in fact, the latter's
conditions on when to move the backing BDS from source to target are not
really correct).

Therefore, remove that code from bdrv_replace_in_backing_chain() and
leave it to mirror_complete().

Depending on what kind of mirroring is performed, we furthermore want to
use different strategies to open the target's backing chain:

- If blockdev-mirror is used, we can assume the user made sure that the
  target already has the correct backing chain. In particular, we should
  not try to open a backing file if the target does not have any yet.

- If drive-mirror with mode=absolute-paths is used, we can and should
  reuse the already existing chain of nodes that the source BDS is in.
  In case of sync=full, no backing BDS is required; with sync=top, we
  just link the source's backing BDS to the target, and with sync=none,
  we use the source BDS as the target's backing BDS.
  We should not try to open these backing files anew because this would
  lead to two BDSs existing per physical file in the backing chain, and
  we would like to avoid such concurrent access.

- If drive-mirror with mode=existing is used, we have to use the
  information provided in the physical image file which means opening
  the target's backing chain completely anew, just as it has been done
  already.
  If the target's backing chain shares images with the source, this may
  lead to multiple BDSs per physical image file. But since we cannot
  reliably ascertain this case, there is nothing we can do about it.
Signed-off-by: NMax Reitz <mreitz@redhat.com>
Message-id: 20160610185750.30956-3-mreitz@redhat.com
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NMax Reitz <mreitz@redhat.com>

274fccee

block: Remove bs->zero_beyond_eof · c9d20029

由 Kevin Wolf 提交于 6月 01, 2016

It is always true for open images now.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

c9d20029

block: Make bdrv_load/save_vmstate coroutine_fns · 1a8ae822

由 Kevin Wolf 提交于 6月 09, 2016

This allows drivers to share code between normal I/O and vmstate
accesses.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

1a8ae822

block: Make .bdrv_load_vmstate() vectored · 5ddda0b8

由 Kevin Wolf 提交于 6月 09, 2016

This brings it in line with .bdrv_save_vmstate().
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>

5ddda0b8

08 6月, 2016 3 次提交

block: Kill bdrv_co_write_zeroes() · c1499a5e

由 Eric Blake 提交于 6月 01, 2016

Now that all drivers have been converted to a byte interface,
we no longer need a sector interface.
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NKevin Wolf <kwolf@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

c1499a5e

block: Add .bdrv_co_pwrite_zeroes() · d05aa8bb

由 Eric Blake 提交于 6月 01, 2016

Update bdrv_co_do_write_zeroes() to be byte-based, and select
between the new byte-based bdrv_co_pwrite_zeroes() or the old
bdrv_co_write_zeroes().  The next patches will convert drivers,
then remove the old interface.
Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

d05aa8bb

block: Track write zero limits in bytes · cf081fca

由 Eric Blake 提交于 6月 01, 2016

Another step towards removing sector-based interfaces: convert
the maximum write and minimum alignment values from sectors to
bytes.  Rename the variables to let the compiler check that all
users are converted to the new semantics.

The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
is constrained by INT_MAX (this means that we can't even
support a 2G write_zeroes, but just under it) - changing
operation lengths to unsigned or to 64-bits is a much bigger
audit, and debatable if we even want to do it (since at the
core, a 32-bit platform will still have ssize_t as its
underlying limit on write()).

Meanwhile, alignment is changed to 'uint32_t', since it makes no
sense to have an alignment larger than the maximum write, and
less painful to use an unsigned type with well-defined behavior
in bit operations than to have to worry about what happens if
a driver mistakenly supplies a negative alignment.

Add an assert that no one was trying to use sectors to get a
write zeroes larger than 2G, and therefore that a later conversion
to bytes won't be impacted by keeping the limit at 32 bits.
Signed-off-by: NEric Blake <eblake@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

cf081fca

26 5月, 2016 1 次提交

block: Fix reconfiguring graph with drained nodes · 36fe1331

由 Kevin Wolf 提交于 5月 17, 2016

When changing the BlockDriverState that a BdrvChild points to while the
node is currently drained, we must call the .drained_end() parent
callback. Conversely, when this means attaching a new node that is
already drained, we need to call .drained_begin().

bdrv_root_attach_child() takes now an opaque parameter, which is needed
because the callbacks must also be called if we're attaching a new child
to the BlockBackend when the root node is already drained, and they need
a way to identify the BlockBackend. Previously, child->opaque was set
too late and the callbacks would still see it as NULL.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>

36fe1331

19 5月, 2016 9 次提交

block: Remove BlockDriverState.blk · 1f0c461b

由 Kevin Wolf 提交于 3月 22, 2016

This patch removes the remaining users of bs->blk, which will allow us
to have multiple BBs on top of a single BDS. In the meantime, all checks
that are currently in place to prevent the user from creating such
setups can be switched to bdrv_has_blk() instead of accessing BDS.blk.

Future patches can allow them and e.g. enable users to mirror to a block
device that already has a BlockBackend on it.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

1f0c461b

block: User BdrvChild callback for device name · 4c265bf9

由 Kevin Wolf 提交于 2月 26, 2016

In order to get rid of bs->blk for bdrv_get_device_name() and
bdrv_get_device_or_node_name(), ask all parents for their name and
simply pick the first one.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

4c265bf9

block: Use BdrvChild callbacks for change_media/resize · 5c8cab48

由 Kevin Wolf 提交于 2月 24, 2016

We want to get rid of BlockDriverState.blk in order to allow multiple
BlockBackends per BDS. Converting the device callbacks in block.c (which
assume a single BlockBackend) to per-child callbacks gets us rid of the
first few instances.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>

5c8cab48

block: Decouple throttling from BlockDriverState · 7ca7f0f6

由 Kevin Wolf 提交于 3月 22, 2016

This moves the throttling related part of the BDS life cycle management
to BlockBackend. The throttling group reference is now kept even when no
medium is inserted.

With this commit, throttling isn't disabled and then re-enabled any more
during graph reconfiguration. This fixes the temporary breakage of I/O
throttling when used with live snapshots or block jobs that manipulate
the graph.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

7ca7f0f6

block: Drain throttling queue with BdrvChild callback · c2066af0

由 Kevin Wolf 提交于 3月 22, 2016

This removes the last part of I/O throttling from block/io.c and moves
it to the BlockBackend.

Instead of having knowledge about throttling inside io.c, we can call a
BdrvChild callback .drained_begin/end, which happens to drain the
throttled requests for BlockBackend parents.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

c2066af0

block: Introduce BdrvChild.opaque · 22aa8b24

由 Kevin Wolf 提交于 2月 24, 2016

BlockBackends use it to get a back pointer from BdrvChild to
BlockBackend in any BdrvChildRole callbacks.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

22aa8b24

block: Move I/O throttling configuration functions to BlockBackend · 97148076

由 Kevin Wolf 提交于 3月 21, 2016

Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

97148076

block: Move throttling fields from BDS to BB · 27ccdd52

由 Kevin Wolf 提交于 3月 21, 2016

This patch changes where the throttling state is stored (used to be the
BlockDriverState, now it is the BlockBackend), but it doesn't actually
make it a BB level feature yet. For example, throttling is still
disabled when the BDS is detached from the BB.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

27ccdd52

block: throttle-groups: Use BlockBackend pointers internally · 31dce3cc

由 Kevin Wolf 提交于 3月 21, 2016

As a first step towards moving I/O throttling to the BlockBackend level,
this patch changes all pointers in struct ThrottleGroup from referencing
a BlockDriverState to referencing a BlockBackend.

This change is valid because we made sure that throttling can only be
enabled on BDSes which have a BB attached.
Signed-off-by: NKevin Wolf <kwolf@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>

31dce3cc

12 5月, 2016 4 次提交

Add new block driver interface to add/delete a BDS's child · e06018ad

由 Wen Congyang 提交于 5月 10, 2016

In some cases, we want to take a quorum child offline, and take
another child online.
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Signed-off-by: Nzhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: NGonglei <arei.gonglei@huawei.com>
Signed-off-by: NChanglong Xie <xiecl.fnst@cn.fujitsu.com>
Reviewed-by: NMax Reitz <mreitz@redhat.com>
Reviewed-by: NAlberto Garcia <berto@igalia.com>
Message-id: 1462865799-19402-2-git-send-email-xiecl.fnst@cn.fujitsu.com
Signed-off-by: NMax Reitz <mreitz@redhat.com>

e06018ad

block: Honor BDRV_REQ_FUA during write_zeroes · 465fe887

由 Eric Blake 提交于 5月 03, 2016

The block layer has a couple of cases where it can lose
Force Unit Access semantics when writing a large block of
zeroes, such that the request returns before the zeroes
have been guaranteed to land on underlying media.

SCSI does not support FUA during WRITESAME(10/16); FUA is only
supported if it falls back to WRITE(10/16).  But where the
underlying device is new enough to not need a fallback, it
means that any upper layer request with FUA semantics was
silently ignoring BDRV_REQ_FUA.

Conversely, NBD has situations where it can support FUA but not
ZERO_WRITE; when that happens, the generic block layer fallback
to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu
2.6) was losing the FUA flag.

The problem of losing flags unrelated to ZERO_WRITE has been
latent in bdrv_co_do_write_zeroes() since commit aa7bfbff, but
back then, it did not matter because there was no FUA flag.  It
became observable when commit 93f5e6d8 paved the way for flags
that can impact correctness, when we should have been using
bdrv_co_writev_flags() with modified flags.  Compare to commit
9eeb6dd1, which got flag manipulation right in
bdrv_co_do_zero_pwritev().

Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA.  When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE.

The fix is do to a cleanup bdrv_co_flush() at the end of the
operation if any step in the middle relied on a BDS that does
not natively support FUA for that step (note that we don't
need to flush after every operation, if the operation is broken
into chunks based on bounce-buffer sizing).  Each BDS gains a
new flag .supported_zero_flags, which parallels the use of
.supported_write_flags but only when accessing a zero write
operation (the flags MUST be different, because of SCSI having
different semantics based on WRITE vs. WRITESAME; and also
because BDRV_REQ_MAY_UNMAP only makes sense on zero writes).

Also fix some documentation to describe -ENOTSUP semantics,
particularly since iscsi depends on those semantics.

Down the road, we may want to add a driver where its
.bdrv_co_pwritev() honors all three of BDRV_REQ_FUA,
BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise
this via bs->supported_write_flags for blocks opened by that
driver; such a driver should NOT supply .bdrv_co_write_zeroes
nor .supported_zero_flags.  But none of the drivers touched
in this patch want to do that (the act of writing zeroes is
different enough from normal writes to deserve a second
callback).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

465fe887

block: Make supported_write_flags a per-bds property · 4df863f3

由 Eric Blake 提交于 5月 03, 2016

Pre-patch, .supported_write_flags lives at the driver level, which
means we are blindly declaring that all block devices using a
given driver will either equally support FUA, or that we need a
fallback at the block layer.  But there are drivers where FUA
support is a per-block decision: the NBD block driver is dependent
on the remote server advertising NBD_FLAG_SEND_FUA (and has
fallback code to duplicate the flush that the block layer would do
if NBD had not set .supported_write_flags); and the iscsi block
driver is dependent on the mode sense bits advertised by the
underlying device (and is currently silently ignoring FUA requests
if the underlying device does not support FUA).

The fix is to make supported flags as a per-BDS option, set during
.bdrv_open().  This patch moves the variable and fixes NBD and iscsi
to set it only conditionally; later patches will then further
simplify the NBD driver to quit duplicating work done at the block
layer, as well as tackle the fact that SCSI does not support FUA
semantics on WRITESAME(10/16) but only on WRITE(10/16).
Signed-off-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Acked-by: NStefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

4df863f3

Allow users to specify the vmdk virtual hardware version. · f249924e

由 Janne Karhunen 提交于 5月 03, 2016

Vmdk images have metadata to indicate the vmware virtual
hardware version image was created/tested to run with.
Allow users to specify that version via new 'hwversion'
option.

[ kwolf: Adjust qemu-iotests common.filter ]
Signed-off-by: NJanne Karhunen <Janne.Karhunen@gmail.com>
Reviewed-by: NFam Zheng <famz@redhat.com>
Signed-off-by: NKevin Wolf <kwolf@redhat.com>

f249924e