1. 17 12月, 2014 17 次提交
    • E
      getstats: add new flag for block backing chain · 4bffafb2
      Eric Blake 提交于
      This patch introduces access to allocation information about
      a backing chain of a live domain.  While querying storage
      volumes for read-only disks could provide some of the details,
      we do NOT want to read() a file while qemu is writing it.
      Also, there is one case where we have to rely on qemu: when
      doing a block commit into a backing file, where that file is
      stored in qcow2 format on a host block device, we want to know
      the current highest write offset into that image, in order to
      know if the disk must be resized larger.  qemu-img does not
      (currently) show this information, and none of the earlier
      block APIs were extensible enough to expose it.  But
      virDomainListGetStats is perfect for the job!
      
      We don't need a new group of statistics, as the existing block
      group is sufficient.  On the other hand, as existing libvirt
      releases already report 1:1 mapping of block.count to <disk>
      devices, changing the array size could confuse older clients;
      and even with newer clients, the time and memory taken to
      report additional statistics is not always necessary (backing
      files are generally read-only except for block-commit, so while
      read statistics may change, sizing statistics will not).  So
      the choice here is to add a new flag that only newer callers
      will pass, when they are prepared for the additional information.
      
      This patch introduces the new API, but it will take more
      patches to get it implemented for qemu.
      
      * include/libvirt/libvirt-domain.h
      (VIR_CONNECT_GET_ALL_DOMAINS_STATS_BACKING): New flag.
      * src/libvirt-domain.c (virConnectGetAllDomainStats): Document it,
      and add a new field when it is in use.
      * tools/virsh-domain-monitor.c (cmdDomstats): Use new flag.
      * tools/virsh.pod (domstats): Document it.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      4bffafb2
    • E
      getstats: prepare for dynamic block.count stat · 14ef1f62
      Eric Blake 提交于
      A coming patch will make it optionally possible to list backing
      chain block stats; in this mode of operation, block.counts is no
      longer the number of <disks> in the domain, but the number of
      blocks in the array being reported.  We still want block.count
      listed first, but rather than iterate the tree twice (once to
      count, and once to list stats), it's easier to just touch things
      up after the fact.
      
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
      after the fact.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      14ef1f62
    • E
      getstats: report block sizes for offline domains · 596a1371
      Eric Blake 提交于
      The prior refactoring can now be put to use. With the same domain
      as the earlier commit 7b499262 (one qcow2 disk and an empty
      cdrom drive):
      $ virsh domstats --block foo
      Domain: 'foo'
        block.count=2
        block.0.name=hda
        block.0.path=/var/lib/libvirt/images/foo.qcow2
        block.0.allocation=1309614080
        block.0.capacity=42949672960
        block.0.physical=1309671424
        block.1.name=hdc
      
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
      qemuStorageLimitsRefresh to report offline statistics.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      596a1371
    • E
      qemu: refactor blockinfo data gathering · 8de6544e
      Eric Blake 提交于
      Create a helper function that can be reused for gathering block
      info from virDomainListGetStats.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Split guts...
      (qemuStorageLimitsRefresh): ...into new helper function.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      8de6544e
    • E
      qemu: fix bugs in blockstats · 0282ca45
      Eric Blake 提交于
      The documentation for virDomainBlockInfo was confusing: it stated
      that 'physical' was the size of the container, then gave an example
      of it being the amount of storage used by a sparse file (that is,
      for a sparse raw image on a regular file, the wording implied
      capacity==physical, while allocation was smaller; but the example
      instead claimed physical==allocation).  Since we use 'physical' for
      the last offset of a block device, we should do likewise for
      regular files.
      
      Furthermore, the example claimed that for a qcow2 regular file,
      allocation==physical.  At the time the code was first written,
      this was true (qcow2 files were allocated sequentially, and were
      never sparse, so the last sector written happened to also match
      the disk space occupied); but modern qemu does much better and
      can punch holes for a qcow2 with allocation < physical.
      
      Basically, after this patch, the three fields are now reliably
      mapped as:
       'capacity' - how much storage the guest can see (equal to
      physical for raw images, determined by image metadata otherwise)
       'allocation' - how much storage the image occupies (similar to
      what 'du' would report)
       'physical' - the last offset of the image (similar to what 'ls'
      would report)
      
      'capacity' can be larger than 'physical' (such as for a qcow2
      image that does not vary much from a backing file) or smaller
      (such as for a qcow2 file with lots of internal snapshots).
      Likewise, 'allocation' can be (slightly) larger than 'physical'
      (such as counting the tail of cluster allocations required to
      round a file size up to filesystem granularity) or smaller
      (for a sparse file).  A block-resize operation changes capacity
      (which, for raw images, also changes physical); many non-raw
      images automatically grow physical and allocation as necessary
      when starting with an allocation smaller than capacity; and even
      when capacity and physical stay unchanged, allocation can change
      when converting sectors from holes to data or back.
      
      Note that this does not change semantics for qcow2 images stored
      on block devices; there, we still rely on qemu to report the
      highest written extent for allocation.  So using this API to
      track when to extend a block device because a qcow2 image is
      about to exceed a threshold will not see any changes.
      
      Also, note that virStorageVolInfo is unfortunately limited to
      just 'capacity' and 'allocation' (we can't expand it to add
      'physical', although we can expand the XML to add it there);
      historically, that struct's 'allocation' value has reported
      file size for qcow2 files (what this patch terms 'physical'
      for a domain block device), but disk usage for raw files (what
      this patch terms 'allocation').  So follow-up patches will be
      needed to make storage volumes report the same allocation
      values and get at physical values, where those differ.
      
      * include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
      documentation to match saner definition.
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
      files, physical size is capacity, not allocation.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      0282ca45
    • E
      getstats: rearrange blockinfo gathering · 05e702cf
      Eric Blake 提交于
      Ultimately, we want to avoid read()ing a file while qemu is running.
      We still have to open() block devices to determine their physical
      size, but that is safer.  This patch rearranges code to group
      together all code that reads the image, to make it easier for later
      patches to skip the metadata collection when possible.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
      disk up front.  Place metadata reading next to use.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      05e702cf
    • E
      getstats: perform recursion in monitor collection · b1802714
      Eric Blake 提交于
      When requested in a later patch, the QMP command results are now
      examined recursively.  As qemu_driver will eventually have to
      read items out of the hash table as stored by this patch, the
      computation of backing alias string is done in a shared location.
      
      * src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
      * src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
      * src/qemu/qemu_monitor_json.c
      (qemuMonitorJSONGetOneBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      b1802714
    • E
      getstats: prepare monitor collection for recursion · 7b11f5e5
      Eric Blake 提交于
      A future patch will allow recursion into backing chains when
      collecting block stats.  This patch should not change behavior,
      but merely moves out the common code that will be reused once
      recursion is enabled, and adds the parameter that will turn on
      recursion.
      
      * src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
      (qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
      although it is ignored for now.
      * src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
      (qemuMonitorBlockStatsUpdateCapacity): Likewise.
      * src/qemu/qemu_monitor_json.h
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
      * src/qemu/qemu_monitor_json.c
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
      split...
      (qemuMonitorJSONGetOneBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
      (qemuMonitorJSONGetBlockStatsInfo): Update caller.
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
      * src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      7b11f5e5
    • E
      qemu: let blockinfo reuse virStorageSource · 89646e69
      Eric Blake 提交于
      Right now, grabbing blockinfo always calls stat on the disk, then
      opens the image to determine the capacity, using a throw-away
      virStorageSourcePtr.  This has a couple of drawbacks:
      
      1. We are calling stat and opening a file on every invocation of
      the API.  However, there are cases where the stats should NOT be
      changing between successive calls (if a domain is running, no
      one should be changing the physical size of a block device or raw
      image behind our backs; capacity of read-only files should not
      be changing; and we are the gateway to the block-resize command
      to know when the capacity of read-write files should be changing).
      True, we still have to use stat in some cases (a sparse raw file
      changes allocation if it is read-write and the amount of holes is
      changing, and a read-write qcow2 image stored in a file changes
      physical size if it was not fully pre-allocated).  But for
      read-only images, even this should be something we can remember
      from the previous time, rather than repeating every call.
      
      2. We want to enhance the power of virDomainListGetStats, by
      sharing code.  But we already have a virStorageSourcePtr for
      each disk, and it would be easier to reuse the common structure
      than to have to worry about the one-off virDomainBlockInfoPtr.
      
      While this patch does not optimize reuse of information in point
      1, it does get us closer to being able to do so; by updating a
      structure that survives between consecutive calls.
      
      * src/util/virstoragefile.h (_virStorageSource): Add physical, to
      mirror virDomainBlockInfo; rearrange fields to match public struct.
      (virStorageSourceCopy): Copy the new field.
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
      storage source, then copy to block info.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      89646e69
    • E
      qemu: refactor blockinfo job handling · a20c3aaf
      Eric Blake 提交于
      In order for a future patch to virDomainListGetStats to reuse
      some code for determining disk usage of offline domains, we
      need to make it easier to pull out part of the guts of grabbing
      blockinfo.  The current implementation grabs a job fairly late
      in the game, while getstats will already own a job; reordering
      things so that the job is always grabbed up front in both
      functions will make it easier to pull out the common code.
      This patch results in grabbing a job in cases where one was not
      previously needed, but as it is a query job, it should not be
      noticeably slower.
      
      This patch touches the same code as the fix for CVE-2014-6458
      (commit b7992595); in that patch, we avoided hotplug changing
      a disk reference during the time of obtaining a monitor lock
      by copying all data we needed and no longer referencing disk;
      this patch goes the other way and ensures that by holding the
      job, the disk cannot be changed so we no longer need to worry
      about the disk being invalidated across the monitor lock.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
      control to be outside of disk information.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      a20c3aaf
    • E
      build: fix typo in previous patch · 9d128a20
      Eric Blake 提交于
      * src/util/virfile.c (safezero_mmap): Fix missing semicolon.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      9d128a20
    • M
      util: Fix fallocate stubs for mingw build · 9bce4386
      Martin Kletzander 提交于
      When any of the functions modified in commit 214c687b took false branch,
      the function itself used none of its parameters resulting in "unused
      parameter" error.  Rewriting these functions to the stubs we use
      elsewhere should fix the problem.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      9bce4386
    • M
      qemu: Free saved error in qemuDomainSetVcpusFlags · 4d1e3943
      Martin Kletzander 提交于
      Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
      not supposed to reset the error.  Usual procedure was done, saving the
      error to temporary variable, but it was never free'd, but rather leaked.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      4d1e3943
    • M
      qemu: Add missing goto error in qemuRestoreCgroupState · 86759ec6
      Martin Kletzander 提交于
      Commit af2a1f05 tried clearly separating each condition in
      qemuRestoreCgroupState() for the sake of readability, however somehow
      one condition body was missing.  That means that the body of the next
      condition got executed only if both of there were true, which is
      impossible, thus resulting in a dead code and a logic error.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      86759ec6
    • M
      conf: Fix invalid condition when parsing storage owner · 57c008f8
      Martin Kletzander 提交于
      In commit d2632d60 we agreed taht we want the parsed uid to properly
      overflow but only to -1, however the value was read into long and then
      wrapped into uid_t.  That meaned it failed on 32-bit systems.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      57c008f8
    • J
      virstoragefile: Have virStorageFileResize use safezero · 18f03166
      John Ferlan 提交于
      Currently virStorageFileResize() function uses build conditionals to
      choose either the posix_fallocate() or syscall(SYS_fallocate) with no
      fallback in order to preallocate the space in the newly resized file.
      
      Since the safezero code has a similar set of conditionals modify the
      resize and safezero code in order to allow the resize logic to make use
      of safezero to unify the look/feel of the code paths.
      
      Add a new boolean (resize) to safezero() to make the optional decision
      whether to try syscall(SYS_fallocate) if the posix_fallocate fails because
      HAVE_POSIX_FALLOCATE is not defined (eg, return -1 and errno == 0).
      
      Create a local safezero_sys_fallocate in order to handle the resize
      code paths that support that.  If not present, the set errno = ENOSYS
      in order to allow the caller to handle the failure scenarios.
      Signed-off-by: NJohn Ferlan <jferlan@redhat.com>
      18f03166
    • J
      virfile: Refactor safezero · 214c687b
      John Ferlan 提交于
      Currently build conditionals decide which of two safezero() functions
      should be built - either the posix_fallocate() or mmap() with a fallback
      to a slower safewrite() algorithm in order to preallocate space in a raw file.
      
      This patch will refactor safezero to utilize static functions for either
      posix_fallocate or mmap/safewrite. The build conditional still exist, but
      are only for shorter sections of code.
      
      The posix_fallocate path will make use of the ret/errno setting to contain
      the logic for safezero to decide whether it needs to fallback to other
      algorithms. A return of -1 with errno not changed will indicate the conditional
      is not present; otherwise, a return of -1 with errno change indicates the
      call was made and it failed (no functional difference to current algorithm).
      
      The mmap/safewrite option changes only slightly to handle the ftruncate
      failure for mmap. That is, previously if the ftruncate failed, there was
      no fallback to the slow safewrite option.
      Signed-off-by: NJohn Ferlan <jferlan@redhat.com>
      214c687b
  2. 16 12月, 2014 14 次提交
  3. 15 12月, 2014 9 次提交