1. 21 12月, 2014 1 次提交
    • M
      qemu: completely rework reference counting · 540c339a
      Martin Kletzander 提交于
      There is one problem that causes various errors in the daemon.  When
      domain is waiting for a job, it is unlocked while waiting on the
      condition.  However, if that domain is for example transient and being
      removed in another API (e.g. cancelling incoming migration), it get's
      unref'd.  If the first call, that was waiting, fails to get the job, it
      unref's the domain object, and because it was the last reference, it
      causes clearing of the whole domain object.  However, when finishing the
      call, the domain must be unlocked, but there is no way for the API to
      know whether it was cleaned or not (unless there is some ugly temporary
      variable, but let's scratch that).
      
      The root cause is that our APIs don't ref the objects they are using and
      all use the implicit reference that the object has when it is in the
      domain list.  That reference can be removed when the API is waiting for
      a job.  And because each domain doesn't do its ref'ing, it results in
      the ugly checking of the return value of virObjectUnref() that we have
      everywhere.
      
      This patch changes qemuDomObjFromDomain() to ref the domain (using
      virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
      should be the only function in which the return value of
      virObjectUnref() is checked.  This makes all reference counting
      deterministic and makes the code a bit clearer.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      540c339a
  2. 19 12月, 2014 3 次提交
    • D
      disable vCPU pinning with TCG mode · 65686e5a
      Daniel P. Berrange 提交于
      Although QMP returns info about vCPU threads in TCG mode, the
      data it returns is mostly lies. Only the first vCPU has a valid
      thread_id returned. The thread_id given for the other vCPUs is
      in fact the main emulator thread. All vCPUs actually run under
      the same thread in TCG mode.
      
      Our vCPU pinning code is not at all able to cope with this
      so if you try to set CPU affinity per-vCPU you end up with
      wierd errors
      
      error: Failed to start domain instance-00000007
      error: cannot set CPU affinity on process 24365: Invalid argument
      
      Since few people will care about the performance of TCG with
      strict CPU pinning, lets just disable that for now, so we get
      a clear error message
      
      error: Failed to start domain instance-00000007
      error: Requested operation is not valid: cpu affinity is not supported
      65686e5a
    • D
      Don't setup fake CPU pids for old QEMU · b07f3d82
      Daniel P. Berrange 提交于
      The code assumes that def->vcpus == nvcpupids, so when we setup
      fake CPU pids for old QEMU with nvcpupids == 1, we cause the
      later code to read off the end of the array. This has fun results
      like sche_setaffinity(0, ...) which changes libvirtd's own CPU
      affinity, or even better sched_setaffinity($RANDOM, ...) which
      changes the affinity of a random OS process.
      b07f3d82
    • M
      qemu: Create memory-backend-{ram,file} iff needed · f309db1f
      Michal Privoznik 提交于
      Libvirt BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1175397
      QEMU BZ:    https://bugzilla.redhat.com/show_bug.cgi?id=1170093
      
      In qemu there are two interesting arguments:
      
      1) -numa to create a guest NUMA node
      2) -object memory-backend-{ram,file} to tell qemu which memory
      region on which host's NUMA node it should allocate the guest
      memory from.
      
      Combining these two together we can instruct qemu to create a
      guest NUMA node that is tied to a host NUMA node. And it works
      just fine. However, depending on machine type used, there might
      be some issued during migration when OVMF is enabled (see QEMU
      BZ). While this truly is a QEMU bug, we can help avoiding it. The
      problem lies within the memory backend objects somewhere. Having
      said that, fix on our side consists on putting those objects on
      the command line if and only if needed. For instance, while
      previously we would construct this (in all ways correct) command
      line:
      
          -object memory-backend-ram,size=256M,id=ram-node0 \
          -numa node,nodeid=0,cpus=0,memdev=ram-node0
      
      now we create just:
      
          -numa node,nodeid=0,cpus=0,mem=256
      
      because the backend object is obviously not tied to any specific
      host NUMA node.
      Signed-off-by: NMichal Privoznik <mprivozn@redhat.com>
      f309db1f
  3. 18 12月, 2014 3 次提交
  4. 17 12月, 2014 15 次提交
    • J
      Fix error message on redirdev caps detection · 952f8a73
      Ján Tomko 提交于
      952f8a73
    • L
      conf: fix cannot start a guest have a shareable network iscsi hostdev · dddd8327
      Luyao Huang 提交于
      https://bugzilla.redhat.com/show_bug.cgi?id=1174569
      
      There's nothing we need to do for shared iSCSI devices in
      qemuAddSharedHostdev and qemuRemoveSharedHostdev. The iSCSI layer
      takes care about that for us.
      Signed-off-by: NLuyao Huang <lhuang@redhat.com>
      Signed-off-by: NMichal Privoznik <mprivozn@redhat.com>
      dddd8327
    • E
      getstats: crawl backing chain for qemu · 3937ef9c
      Eric Blake 提交于
      Wire up backing chain recursion.  For the first time, it is now
      possible to get libvirt to expose that qemu tracks read statistics
      on backing files, as well as report maximum extent written on a
      backing file during a block-commit operation.
      
      For a running domain, where one of the two images has a backing
      file, I see the traditional output:
      
      $ virsh domstats --block testvm2
      Domain: 'testvm2'
        block.count=2
        block.0.name=vda
        block.0.path=/tmp/wrapper.qcow2
        block.0.rd.reqs=1
        block.0.rd.bytes=512
        block.0.rd.times=28858
        block.0.wr.reqs=0
        block.0.wr.bytes=0
        block.0.wr.times=0
        block.0.fl.reqs=0
        block.0.fl.times=0
        block.0.allocation=0
        block.0.capacity=1310720000
        block.0.physical=200704
        block.1.name=vdb
        block.1.path=/dev/sda7
        block.1.rd.reqs=0
        block.1.rd.bytes=0
        block.1.rd.times=0
        block.1.wr.reqs=0
        block.1.wr.bytes=0
        block.1.wr.times=0
        block.1.fl.reqs=0
        block.1.fl.times=0
        block.1.allocation=0
        block.1.capacity=1310720000
      
      vs. the new output:
      
      $ virsh domstats --block --backing testvm2
      Domain: 'testvm2'
        block.count=3
        block.0.name=vda
        block.0.path=/tmp/wrapper.qcow2
        block.0.rd.reqs=1
        block.0.rd.bytes=512
        block.0.rd.times=28858
        block.0.wr.reqs=0
        block.0.wr.bytes=0
        block.0.wr.times=0
        block.0.fl.reqs=0
        block.0.fl.times=0
        block.0.allocation=0
        block.0.capacity=1310720000
        block.0.physical=200704
        block.1.name=vda
        block.1.path=/dev/sda6
        block.1.backingIndex=1
        block.1.rd.reqs=0
        block.1.rd.bytes=0
        block.1.rd.times=0
        block.1.wr.reqs=0
        block.1.wr.bytes=0
        block.1.wr.times=0
        block.1.fl.reqs=0
        block.1.fl.times=0
        block.1.allocation=327680
        block.1.capacity=786432000
        block.2.name=vdb
        block.2.path=/dev/sda7
        block.2.rd.reqs=0
        block.2.rd.bytes=0
        block.2.rd.times=0
        block.2.wr.reqs=0
        block.2.wr.bytes=0
        block.2.wr.times=0
        block.2.fl.reqs=0
        block.2.fl.times=0
        block.2.allocation=0
        block.2.capacity=1310720000
      
      I may later do a patch that trims the output to avoid 0 stats,
      particularly for backing files (which are more likely to have
      0 stats, at least for write statistics when no block-commit
      is performed).  Also, I still plan to expose physical size
      information (qemu doesn't expose it yet, so it requires a stat,
      and for block devices, a further open/seek operation).  But
      this patch is good enough without worrying about that yet.
      
      * src/qemu/qemu_driver.c (QEMU_DOMAIN_STATS_BACKING): New internal
      enum bit.
      (qemuConnectGetAllDomainStats): Recognize new user flag, and pass
      details to...
      (qemuDomainGetStatsBlock): ...here, where we can do longer recursion.
      (qemuDomainGetStatsOneBlock): Output new field.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      3937ef9c
    • E
      getstats: split block stats reporting for easier recursion · c2d380bf
      Eric Blake 提交于
      In order to report stats on backing chains, we need to separate
      the output of stats for one block from how we traverse blocks.
      
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Split...
      (qemuDomainGetStatsOneBlock): ...into new helper.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      c2d380bf
    • E
      getstats: prepare for dynamic block.count stat · 14ef1f62
      Eric Blake 提交于
      A coming patch will make it optionally possible to list backing
      chain block stats; in this mode of operation, block.counts is no
      longer the number of <disks> in the domain, but the number of
      blocks in the array being reported.  We still want block.count
      listed first, but rather than iterate the tree twice (once to
      count, and once to list stats), it's easier to just touch things
      up after the fact.
      
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Compute count
      after the fact.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      14ef1f62
    • E
      getstats: report block sizes for offline domains · 596a1371
      Eric Blake 提交于
      The prior refactoring can now be put to use. With the same domain
      as the earlier commit 7b499262 (one qcow2 disk and an empty
      cdrom drive):
      $ virsh domstats --block foo
      Domain: 'foo'
        block.count=2
        block.0.name=hda
        block.0.path=/var/lib/libvirt/images/foo.qcow2
        block.0.allocation=1309614080
        block.0.capacity=42949672960
        block.0.physical=1309671424
        block.1.name=hdc
      
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Use
      qemuStorageLimitsRefresh to report offline statistics.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      596a1371
    • E
      qemu: refactor blockinfo data gathering · 8de6544e
      Eric Blake 提交于
      Create a helper function that can be reused for gathering block
      info from virDomainListGetStats.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Split guts...
      (qemuStorageLimitsRefresh): ...into new helper function.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      8de6544e
    • E
      qemu: fix bugs in blockstats · 0282ca45
      Eric Blake 提交于
      The documentation for virDomainBlockInfo was confusing: it stated
      that 'physical' was the size of the container, then gave an example
      of it being the amount of storage used by a sparse file (that is,
      for a sparse raw image on a regular file, the wording implied
      capacity==physical, while allocation was smaller; but the example
      instead claimed physical==allocation).  Since we use 'physical' for
      the last offset of a block device, we should do likewise for
      regular files.
      
      Furthermore, the example claimed that for a qcow2 regular file,
      allocation==physical.  At the time the code was first written,
      this was true (qcow2 files were allocated sequentially, and were
      never sparse, so the last sector written happened to also match
      the disk space occupied); but modern qemu does much better and
      can punch holes for a qcow2 with allocation < physical.
      
      Basically, after this patch, the three fields are now reliably
      mapped as:
       'capacity' - how much storage the guest can see (equal to
      physical for raw images, determined by image metadata otherwise)
       'allocation' - how much storage the image occupies (similar to
      what 'du' would report)
       'physical' - the last offset of the image (similar to what 'ls'
      would report)
      
      'capacity' can be larger than 'physical' (such as for a qcow2
      image that does not vary much from a backing file) or smaller
      (such as for a qcow2 file with lots of internal snapshots).
      Likewise, 'allocation' can be (slightly) larger than 'physical'
      (such as counting the tail of cluster allocations required to
      round a file size up to filesystem granularity) or smaller
      (for a sparse file).  A block-resize operation changes capacity
      (which, for raw images, also changes physical); many non-raw
      images automatically grow physical and allocation as necessary
      when starting with an allocation smaller than capacity; and even
      when capacity and physical stay unchanged, allocation can change
      when converting sectors from holes to data or back.
      
      Note that this does not change semantics for qcow2 images stored
      on block devices; there, we still rely on qemu to report the
      highest written extent for allocation.  So using this API to
      track when to extend a block device because a qcow2 image is
      about to exceed a threshold will not see any changes.
      
      Also, note that virStorageVolInfo is unfortunately limited to
      just 'capacity' and 'allocation' (we can't expand it to add
      'physical', although we can expand the XML to add it there);
      historically, that struct's 'allocation' value has reported
      file size for qcow2 files (what this patch terms 'physical'
      for a domain block device), but disk usage for raw files (what
      this patch terms 'allocation').  So follow-up patches will be
      needed to make storage volumes report the same allocation
      values and get at physical values, where those differ.
      
      * include/libvirt/libvirt-domain.h (_virDomainBlockInfo): Tweak
      documentation to match saner definition.
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): For regular
      files, physical size is capacity, not allocation.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      0282ca45
    • E
      getstats: rearrange blockinfo gathering · 05e702cf
      Eric Blake 提交于
      Ultimately, we want to avoid read()ing a file while qemu is running.
      We still have to open() block devices to determine their physical
      size, but that is safer.  This patch rearranges code to group
      together all code that reads the image, to make it easier for later
      patches to skip the metadata collection when possible.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Check for empty
      disk up front.  Place metadata reading next to use.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      05e702cf
    • E
      getstats: perform recursion in monitor collection · b1802714
      Eric Blake 提交于
      When requested in a later patch, the QMP command results are now
      examined recursively.  As qemu_driver will eventually have to
      read items out of the hash table as stored by this patch, the
      computation of backing alias string is done in a shared location.
      
      * src/qemu/qemu_domain.h (qemuDomainStorageAlias): New prototype.
      * src/qemu/qemu_domain.c (qemuDomainStorageAlias): Implement it.
      * src/qemu/qemu_monitor_json.c
      (qemuMonitorJSONGetOneBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacityOne): Perform recursion.
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Update callers.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      b1802714
    • E
      getstats: prepare monitor collection for recursion · 7b11f5e5
      Eric Blake 提交于
      A future patch will allow recursion into backing chains when
      collecting block stats.  This patch should not change behavior,
      but merely moves out the common code that will be reused once
      recursion is enabled, and adds the parameter that will turn on
      recursion.
      
      * src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
      (qemuMonitorBlockStatsUpdateCapacity): Add recursion parameter,
      although it is ignored for now.
      * src/qemu/qemu_monitor.h (qemuMonitorGetAllBlockStatsInfo)
      (qemuMonitorBlockStatsUpdateCapacity): Likewise.
      * src/qemu/qemu_monitor_json.h
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Likewise.
      * src/qemu/qemu_monitor_json.c
      (qemuMonitorJSONGetAllBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacity): Add parameter, and
      split...
      (qemuMonitorJSONGetOneBlockStatsInfo)
      (qemuMonitorJSONBlockStatsUpdateCapacityOne): ...into helpers.
      (qemuMonitorJSONGetBlockStatsInfo): Update caller.
      * src/qemu/qemu_driver.c (qemuDomainGetStatsBlock): Update caller.
      * src/qemu/qemu_migration.c (qemuMigrationCookieAddNBD): Likewise.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      7b11f5e5
    • E
      qemu: let blockinfo reuse virStorageSource · 89646e69
      Eric Blake 提交于
      Right now, grabbing blockinfo always calls stat on the disk, then
      opens the image to determine the capacity, using a throw-away
      virStorageSourcePtr.  This has a couple of drawbacks:
      
      1. We are calling stat and opening a file on every invocation of
      the API.  However, there are cases where the stats should NOT be
      changing between successive calls (if a domain is running, no
      one should be changing the physical size of a block device or raw
      image behind our backs; capacity of read-only files should not
      be changing; and we are the gateway to the block-resize command
      to know when the capacity of read-write files should be changing).
      True, we still have to use stat in some cases (a sparse raw file
      changes allocation if it is read-write and the amount of holes is
      changing, and a read-write qcow2 image stored in a file changes
      physical size if it was not fully pre-allocated).  But for
      read-only images, even this should be something we can remember
      from the previous time, rather than repeating every call.
      
      2. We want to enhance the power of virDomainListGetStats, by
      sharing code.  But we already have a virStorageSourcePtr for
      each disk, and it would be easier to reuse the common structure
      than to have to worry about the one-off virDomainBlockInfoPtr.
      
      While this patch does not optimize reuse of information in point
      1, it does get us closer to being able to do so; by updating a
      structure that survives between consecutive calls.
      
      * src/util/virstoragefile.h (_virStorageSource): Add physical, to
      mirror virDomainBlockInfo; rearrange fields to match public struct.
      (virStorageSourceCopy): Copy the new field.
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Store into
      storage source, then copy to block info.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      89646e69
    • E
      qemu: refactor blockinfo job handling · a20c3aaf
      Eric Blake 提交于
      In order for a future patch to virDomainListGetStats to reuse
      some code for determining disk usage of offline domains, we
      need to make it easier to pull out part of the guts of grabbing
      blockinfo.  The current implementation grabs a job fairly late
      in the game, while getstats will already own a job; reordering
      things so that the job is always grabbed up front in both
      functions will make it easier to pull out the common code.
      This patch results in grabbing a job in cases where one was not
      previously needed, but as it is a query job, it should not be
      noticeably slower.
      
      This patch touches the same code as the fix for CVE-2014-6458
      (commit b7992595); in that patch, we avoided hotplug changing
      a disk reference during the time of obtaining a monitor lock
      by copying all data we needed and no longer referencing disk;
      this patch goes the other way and ensures that by holding the
      job, the disk cannot be changed so we no longer need to worry
      about the disk being invalidated across the monitor lock.
      
      * src/qemu/qemu_driver.c (qemuDomainGetBlockInfo): Rearrange job
      control to be outside of disk information.
      Signed-off-by: NEric Blake <eblake@redhat.com>
      a20c3aaf
    • M
      qemu: Free saved error in qemuDomainSetVcpusFlags · 4d1e3943
      Martin Kletzander 提交于
      Commit e3435caf added cleanup code to qemuDomainSetVcpusFlags() that was
      not supposed to reset the error.  Usual procedure was done, saving the
      error to temporary variable, but it was never free'd, but rather leaked.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      4d1e3943
    • M
      qemu: Add missing goto error in qemuRestoreCgroupState · 86759ec6
      Martin Kletzander 提交于
      Commit af2a1f05 tried clearly separating each condition in
      qemuRestoreCgroupState() for the sake of readability, however somehow
      one condition body was missing.  That means that the body of the next
      condition got executed only if both of there were true, which is
      impossible, thus resulting in a dead code and a logic error.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      86759ec6
  5. 16 12月, 2014 4 次提交
  6. 15 12月, 2014 7 次提交
  7. 14 12月, 2014 2 次提交
    • L
      qemu: add a qemuInterfaceStopDevices(), called when guest CPUs stop · c5a54917
      Laine Stump 提交于
      We now have a qemuInterfaceStartDevices() which does the final
      activation needed for the host-side tap/macvtap devices that are used
      for qemu network connections. It will soon make sense to have the
      converse qemuInterfaceStopDevices() which will undo whatever was done
      during qemuInterfaceStartDevices().
      
      A function to "stop" a single device has also been added, and is
      called from the appropriate place in qemuDomainDetachNetDevice(),
      although this is currently unnecessary - the device is going to
      immediately be deleted anyway, so any extra "deactivation" will be for
      naught. The call is included for completeness, though, in anticipation
      that in the future there may be some required action that *isn't*
      nullified by deleting the device.
      
      This patch is a part of a more complete fix for:
      
        https://bugzilla.redhat.com/show_bug.cgi?id=1081461
      c5a54917
    • L
      qemu: always call qemuInterfaceStartDevices() when starting CPUs · 879c13d6
      Laine Stump 提交于
      The patch that added qemuInterfaceStartDevices() (upstream commit
      82977058) had an extra conditional to
      prevent calling it if the reason for starting the CPUs was
      VIR_DOMAIN_RUNNING_UNPAUSED or VIR_DOMAIN_RUNNING_SAVE_CANCELED.  This
      was put in by the author as the result of a reviewer asking if it was
      necessary to ifup the interfaces in *all* occasions (because these
      were the two cases where the CPU would have already been started (and
      stopped) once, so the interface would already be ifup'ed).
      
      It turns out that, as long as there is no corresponding
      qemuInterfaceStopDevices() to ifdown the interfaces anytime the CPUs
      are stopped, neglecting to ifup when reason is RUNNING_UNPAUSED or
      RUNNING_SAVE_CANCELED doesn't cause any problems (because it just
      happens that the interface will have already been ifup'ed by a prior
      call when the CPU was previously started for some other reason).
      
      However, it also doesn't *help*, and there will soon be a
      qemuInterfaceStopDevices() function which *will* ifdown these
      interfaces when the guest CPUs are stopped, and once that is done, the
      interfaces will be left down in some cases when they should be up (for
      example, if a domain is paused and then unpaused).
      
      So, this patch is removing the condition in favor of always calling
      qemuInterfaeStartDevices() when the guest CPUs are started.
      
      This patch (and the aforementioned patch) resolve:
      
        https://bugzilla.redhat.com/show_bug.cgi?id=1081461
      879c13d6
  8. 11 12月, 2014 2 次提交
  9. 10 12月, 2014 2 次提交
  10. 09 12月, 2014 1 次提交