1. 16 6月, 2016 1 次提交
  2. 08 6月, 2016 7 次提交
    • D
      qcow2: avoid extra flushes in qcow2 · f3c3b87d
      Denis V. Lunev 提交于
      The problem with excessive flushing was found by a couple of performance
      tests:
        - parallel directory tree creation (from 2 processes)
        - 32 cached writes + fsync at the end in a loop
      
      For the first one results improved from 2.6 loops/sec to 3.5 loops/sec.
      Each loop creates 10^3 directories with 10 files in each.
      
      For the second one results improved from ~600 fsync/sec to ~1100
      fsync/sec. Though, it was run on SSD so it probably won't show such
      performance gain on rotational media.
      
      qcow2_cache_flush() calls bdrv_flush() unconditionally after writing
      cache entries of a particular cache. This can lead to as many as
      2 additional fdatasyncs inside bdrv_flush.
      
      We can simply skip all fdatasync calls inside qcow2_co_flush_to_os
      as bdrv_flush for sure will do the job. These flushes are necessary to
      keep the right order of writes to the different caches. Though this is
      not necessary in the current code base as this ordering is ensured through
      the flush in qcow2_cache_flush_dependency().
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Pavel Borzenkov <pborzenkov@virtuozzo.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      CC: Max Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      f3c3b87d
    • E
      qcow2: Convert to bdrv_co_pwrite_zeroes() · 5544b59f
      Eric Blake 提交于
      Another step on our continuing quest to switch to byte-based
      interfaces.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      5544b59f
    • E
      block: Switch bdrv_write_zeroes() to byte interface · 74021bc4
      Eric Blake 提交于
      Rename to bdrv_pwrite_zeroes() to let the compiler ensure we
      cater to the updated semantics.  Do the same for bdrv_co_write_zeroes().
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      74021bc4
    • E
      block: Track write zero limits in bytes · cf081fca
      Eric Blake 提交于
      Another step towards removing sector-based interfaces: convert
      the maximum write and minimum alignment values from sectors to
      bytes.  Rename the variables to let the compiler check that all
      users are converted to the new semantics.
      
      The maximum remains an int as long as BDRV_REQUEST_MAX_SECTORS
      is constrained by INT_MAX (this means that we can't even
      support a 2G write_zeroes, but just under it) - changing
      operation lengths to unsigned or to 64-bits is a much bigger
      audit, and debatable if we even want to do it (since at the
      core, a 32-bit platform will still have ssize_t as its
      underlying limit on write()).
      
      Meanwhile, alignment is changed to 'uint32_t', since it makes no
      sense to have an alignment larger than the maximum write, and
      less painful to use an unsigned type with well-defined behavior
      in bit operations than to have to worry about what happens if
      a driver mistakenly supplies a negative alignment.
      
      Add an assert that no one was trying to use sectors to get a
      write zeroes larger than 2G, and therefore that a later conversion
      to bytes won't be impacted by keeping the limit at 32 bits.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      cf081fca
    • E
      qcow2: Catch more unaligned write_zero into zero cluster · ebb718a5
      Eric Blake 提交于
      is_zero_cluster() and is_zero_cluster_top_locked() are used only
      by qcow2_co_write_zeroes().  The former is too broad (we don't
      care if the sectors we are about to overwrite are non-zero, only
      that all other sectors in the cluster are zero), so it needs to
      be called up to twice but with smaller limits - rename it along
      with adding the neeeded parameter.  The latter can be inlined for
      more compact code.
      
      The testsuite change shows that we now have a sparser top file
      when an unaligned write_zeroes overwrites the only portion of
      the backing file with data.
      
      Based on a patch proposal by Denis V. Lunev.
      
      CC: Denis V. Lunev <den@openvz.org>
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Reviewed-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ebb718a5
    • D
      qcow2: add tracepoints for qcow2_co_write_zeroes · 5a64e942
      Denis V. Lunev 提交于
      This patch follows guidelines of all other tracepoints in qcow2, like ones
      in qcow2_co_writev. I think that they should dump values in the same
      quantities or be changed all together.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Eric Blake <eblake@redhat.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      Message-Id: <1463476543-3087-4-git-send-email-den@openvz.org>
      [eblake: typo fix in commit message]
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      5a64e942
    • D
      qcow2: simplify logic in qcow2_co_write_zeroes · ba142846
      Denis V. Lunev 提交于
      Unaligned requests will occupy only one cluster. This is true since the
      previous commit. Simplify the code taking this consideration into
      account.
      
      In other words, the caller is now buggy if it ever passes us an unaligned
      request that crosses cluster boundaries (the only requests that can cross
      boundaries will be aligned).
      
      There are no other changes so far.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      CC: Eric Blake <eblake@redhat.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      Message-Id: <1463476543-3087-3-git-send-email-den@openvz.org>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      ba142846
  3. 19 5月, 2016 3 次提交
  4. 12 5月, 2016 3 次提交
    • F
      block: Drop superfluous invalidating bs->file from drivers · c9e9e9c6
      Fam Zheng 提交于
      Now they are invalidated by the block layer, so it's not necessary to
      do this in block drivers' implementations of .bdrv_invalidate_cache.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NAlberto Garcia <berto@igalia.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      c9e9e9c6
    • D
      qcow2: improve qcow2_co_write_zeroes() · 2928abce
      Denis V. Lunev 提交于
      There is a possibility that qcow2_co_write_zeroes() will be called
      with the partial block. This could be synthetically triggered with
          qemu-io -c "write -z 32k 4k"
      and can happen in the real life in qemu-nbd. The latter happens under
      the following conditions:
          (1) qemu-nbd is started with --detect-zeroes=on and is connected to the
              kernel NBD client
          (2) third party program opens kernel NBD device with O_DIRECT
          (3) third party program performs write operation with memory buffer
              not aligned to the page
      In this case qcow2_co_write_zeroes() is unable to perform the operation
      and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller
      switches to non-optimized version and writes real zeroes to the disk.
      
      The patch creates a shortcut. If the block is read as zeroes, f.e. if
      it is unallocated, the request is extended to cover full block.
      User-visible situation with this block is not changed. Before the patch
      the block is filled in the image with real zeroes. After that patch the
      block is marked as zeroed in metadata. Thus any subsequent changes in
      backing store chain are not affected.
      
      Kevin, thank you for a cool suggestion.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      CC: Kevin Wolf <kwolf@redhat.com>
      CC: Max Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      2928abce
    • E
      block: Allow BDRV_REQ_FUA through blk_pwrite() · 8341f00d
      Eric Blake 提交于
      We have several block drivers that understand BDRV_REQ_FUA,
      and emulate it in the block layer for the rest by a full flush.
      But without a way to actually request BDRV_REQ_FUA during a
      pass-through blk_pwrite(), FUA-aware block drivers like NBD are
      forced to repeat the emulation logic of a full flush regardless
      of whether the backend they are writing to could do it more
      efficiently.
      
      This patch just wires up a flags argument; followup patches
      will actually make use of it in the NBD driver and in qemu-io.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Acked-by: NDenis V. Lunev <den@openvz.org>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      8341f00d
  5. 13 4月, 2016 1 次提交
  6. 30 3月, 2016 2 次提交
    • K
      block: Always set writeback mode in blk_new_open() · 72e775c7
      Kevin Wolf 提交于
      All callers of blk_new_open() either don't rely on the WCE bit set after
      blk_new_open() because they explicitly set it anyway, or they pass
      BDRV_O_CACHE_WB unconditionally.
      
      This patch changes blk_new_open() so that it always enables writeback
      mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that
      used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      72e775c7
    • D
      block: move encryption deprecation warning into qcow code · e6ff69bf
      Daniel P. Berrange 提交于
      For a couple of releases we have been warning
      
        Encrypted images are deprecated
        Support for them will be removed in a future release.
        You can use 'qemu-img convert' to convert your image to an unencrypted one.
      
      This warning was issued by system emulators, qemu-img, qemu-nbd
      and qemu-io. Such a broad warning was issued because the original
      intention was to rip out all the code for dealing with encryption
      inside the QEMU block layer APIs.
      
      The new block encryption framework used for the LUKS driver does
      not rely on the unloved block layer API for encryption keys,
      instead using the QOM 'secret' object type. It is thus no longer
      appropriate to warn about encryption unconditionally.
      
      When the qcow/qcow2 drivers are converted to use the new encryption
      framework too, it will be practical to keep AES-CBC support present
      for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability
      with older QEMU versions and liberation of data from existing encrypted
      qcow2 files.
      
      This change moves the warning out of the generic block code and
      into the qcow/qcow2 drivers. Further, the warning is set to only
      appear when running the system emulators, since qemu-img, qemu-io,
      qemu-nbd are expected to support qcow2 encryption long term now that
      the maint burden has been eliminated.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      e6ff69bf
  7. 23 3月, 2016 1 次提交
  8. 18 3月, 2016 1 次提交
    • E
      qapi: Don't special-case simple union wrappers · 32bafa8f
      Eric Blake 提交于
      Simple unions were carrying a special case that hid their 'data'
      QMP member from the resulting C struct, via the hack method
      QAPISchemaObjectTypeVariant.simple_union_type().  But by using
      the work we started by unboxing flat union and alternate
      branches, coupled with the ability to visit the members of an
      implicit type, we can now expose the simple union's implicit
      type in qapi-types.h:
      
      | struct q_obj_ImageInfoSpecificQCow2_wrapper {
      |     ImageInfoSpecificQCow2 *data;
      | };
      |
      | struct q_obj_ImageInfoSpecificVmdk_wrapper {
      |     ImageInfoSpecificVmdk *data;
      | };
      ...
      | struct ImageInfoSpecific {
      |     ImageInfoSpecificKind type;
      |     union { /* union tag is @type */
      |         void *data;
      |-        ImageInfoSpecificQCow2 *qcow2;
      |-        ImageInfoSpecificVmdk *vmdk;
      |+        q_obj_ImageInfoSpecificQCow2_wrapper qcow2;
      |+        q_obj_ImageInfoSpecificVmdk_wrapper vmdk;
      |     } u;
      | };
      
      Doing this removes asymmetry between QAPI's QMP side and its
      C side (both sides now expose 'data'), and means that the
      treatment of a simple union as sugar for a flat union is now
      equivalent in both languages (previously the two approaches used
      a different layer of dereferencing, where the simple union could
      be converted to a flat union with equivalent C layout but
      different {} on the wire, or to an equivalent QMP wire form
      but with different C representation).  Using the implicit type
      also lets us get rid of the simple_union_type() hack.
      
      Of course, now all clients of simple unions have to adjust from
      using su->u.member to using su->u.member.data; while this touches
      a number of files in the tree, some earlier cleanup patches
      helped minimize the change to the initialization of a temporary
      variable rather than every single member access.  The generated
      qapi-visit.c code is also affected by the layout change:
      
      |@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member
      |     }
      |     switch (obj->type) {
      |     case IMAGE_INFO_SPECIFIC_KIND_QCOW2:
      |-        visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err);
      |+        visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err);
      |         break;
      |     case IMAGE_INFO_SPECIFIC_KIND_VMDK:
      |-        visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err);
      |+        visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err);
      |         break;
      |     default:
      |         abort();
      Signed-off-by: NEric Blake <eblake@redhat.com>
      Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com>
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      32bafa8f
  9. 17 3月, 2016 2 次提交
    • M
      blockdev: Split monitor reference from BB creation · efaa7c4e
      Max Reitz 提交于
      Before this patch, blk_new() automatically assigned a name to the new
      BlockBackend and considered it referenced by the monitor. This patch
      removes the implicit monitor_add_blk() call from blk_new() (and
      consequently the monitor_remove_blk() call from blk_delete(), too) and
      thus blk_new() (and related functions) no longer take a BB name
      argument.
      
      In fact, there is only a single point where blk_new()/blk_new_open() is
      called and the new BB is monitor-owned, and that is in blockdev_init().
      Besides thus relieving us from having to invent names for all of the BBs
      we use in qemu-img, this fixes a bug where qemu cannot create a new
      image if there already is a monitor-owned BB named "image".
      
      If a BB and its BDS tree are created in a single operation, as of this
      patch the BDS tree will be created before the BB is given a name
      (whereas it was the other way around before). This results in minor
      change to the output of iotest 087, whose reference output is amended
      accordingly.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      efaa7c4e
    • M
      qapi: Drop QERR_UNKNOWN_BLOCK_FORMAT_FEATURE · a55448b3
      Max Reitz 提交于
      Just specifying a custom string is simpler in basically all places that
      used it, and in addition, specifying the BB or node name is something we
      generally do not do in other error messages when opening a BDS, so we
      should not do it here.
      
      This changes the output for iotest 036 (to the better, in my opinion),
      so the reference output needs to be changed accordingly.
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      a55448b3
  10. 14 3月, 2016 2 次提交
  11. 03 2月, 2016 2 次提交
  12. 20 1月, 2016 7 次提交
  13. 13 1月, 2016 1 次提交
    • M
      error: Use error_prepend() where it makes obvious sense · e43bfd9c
      Markus Armbruster 提交于
      Done with this Coccinelle semantic patch
      
          @@
          expression FMT, E1, E2;
          expression list ARGS;
          @@
          -    error_setg(E1, FMT, ARGS, error_get_pretty(E2));
          +    error_propagate(E1, E2);/*###*/
          +    error_prepend(E1, FMT/*@@@*/, ARGS);
      
      followed by manual cleanup, first because I can't figure out how to
      make Coccinelle transform strings, and second to get rid of now
      superfluous error_propagate().
      
      We now use or propagate the original error whole instead of just its
      message obtained with error_get_pretty().  This avoids suppressing its
      hint (see commit 50b7b000), but I can't see how the errors touched in
      this commit could come with hints.  It also improves the message
      printed with &error_abort when we screw up (see commit 1e9b65bb).
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      e43bfd9c
  14. 18 12月, 2015 7 次提交