1. 15 4月, 2015 1 次提交
  2. 14 4月, 2015 1 次提交
  3. 08 4月, 2015 2 次提交
    • M
      qemuProcessHook: Call virNuma*() only when needed · ea576ee5
      Michal Privoznik 提交于
      https://bugzilla.redhat.com/show_bug.cgi?id=1198645
      
      Once upon a time, there was a little domain. And the domain was pinned
      onto a NUMA node and hasn't fully allocated its memory:
      
        <memory unit='KiB'>2355200</memory>
        <currentMemory unit='KiB'>1048576</currentMemory>
      
        <numatune>
          <memory mode='strict' nodeset='0'/>
        </numatune>
      
      Oh little me, said the domain, what will I do with so little memory.
      If I only had a few megabytes more. But the old admin noticed the
      whimpering, barely audible to untrained human ear. And good admin he
      was, he gave the domain yet more memory. But the old NUMA topology
      witch forbade to allocate more memory on the node zero. So he
      decided to allocate it on a different node:
      
      virsh # numatune little_domain --nodeset 0-1
      
      virsh # setmem little_domain 2355200
      
      The little domain was happy. For a while. Until bad, sharp teeth
      shaped creature came. Every process in the system was afraid of him.
      The OOM Killer they called him. Oh no, he's after the little domain.
      There's no escape.
      
      Do you kids know why? Because when the little domain was born, her
      father, Libvirt, called numa_set_membind(). So even if the admin
      allowed her to allocate memory from other nodes in the cgroups, the
      membind() forbid it.
      
      So what's the lesson? Libvirt should rely on cgroups, whenever
      possible and use numa_set_membind() as the last ditch effort.
      Signed-off-by: NMichal Privoznik <mprivozn@redhat.com>
      ea576ee5
    • M
      qemu: fix crash in qemuProcessAutoDestroy · 7578cc17
      Michael Chapman 提交于
      The destination libvirt daemon in a migration may segfault if the client
      disconnects immediately after the migration has begun:
      
        # virsh -c qemu+tls://remote/system list --all
         Id    Name                           State
        ----------------------------------------------------
        ...
      
        # timeout --signal KILL 1 \
            virsh migrate example qemu+tls://remote/system \
              --verbose --compressed --live --auto-converge \
              --abort-on-error --unsafe --persistent \
              --undefinesource --copy-storage-all --xml example.xml
        Killed
      
        # virsh -c qemu+tls://remote/system list --all
        error: failed to connect to the hypervisor
        error: unable to connect to server at 'remote:16514': Connection refused
      
      The crash is in:
      
         1531 void
         1532 qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj)
         1533 {
         1534     qemuDomainObjPrivatePtr priv = obj->privateData;
         1535     qemuDomainJob job = priv->job.active;
         1536
         1537     priv->jobs_queued--;
      
      Backtrace:
      
        #0  at qemuDomainObjEndJob at qemu/qemu_domain.c:1537
        #1  in qemuDomainRemoveInactive at qemu/qemu_domain.c:2497
        #2  in qemuProcessAutoDestroy at qemu/qemu_process.c:5646
        #3  in virCloseCallbacksRun at util/virclosecallbacks.c:350
        #4  in qemuConnectClose at qemu/qemu_driver.c:1154
        ...
      
      qemuDomainRemoveInactive calls virDomainObjListRemove, which in this
      case is holding the last remaining reference to the domain.
      qemuDomainRemoveInactive then calls qemuDomainObjEndJob, but the domain
      object has been freed and poisoned by then.
      
      This patch bumps the domain's refcount until qemuDomainRemoveInactive
      has completed. We also ensure qemuProcessAutoDestroy does not return the
      domain to virCloseCallbacksRun to be unlocked in this case. There is
      similar logic in bhyveProcessAutoDestroy and lxcProcessAutoDestroy
      (which call virDomainObjListRemove directly).
      Signed-off-by: NMichael Chapman <mike@very.puzzling.org>
      7578cc17
  4. 02 4月, 2015 3 次提交
  5. 31 3月, 2015 1 次提交
    • P
      qemu: blockjob: Synchronously update backing chain in XML on ABORT/PIVOT · 630ee5ac
      Peter Krempa 提交于
      When the synchronous pivot option is selected, libvirt would not update
      the backing chain until the job was exitted. Some applications then
      received invalid data as their job serialized first.
      
      This patch removes polling to wait for the ABORT/PIVOT job completion
      and replaces it with a condition. If a synchronous operation is
      requested the update of the XML is executed in the job of the caller of
      the synchronous request. Otherwise the monitor event callback uses a
      separate worker to update the backing chain with a new job.
      
      This is a regression since 1a92c719
      
      When the ABORT job is finished synchronously you get the following call
      stack:
       #0  qemuBlockJobEventProcess
       #1  qemuDomainBlockJobImpl
       #2  qemuDomainBlockJobAbort
       #3  virDomainBlockJobAbort
      
      While previously or while using the _ASYNC flag you'd get:
       #0  qemuBlockJobEventProcess
       #1  processBlockJobEvent
       #2  qemuProcessEventHandler
       #3  virThreadPoolWorker
      630ee5ac
  6. 26 3月, 2015 1 次提交
  7. 23 3月, 2015 1 次提交
  8. 19 3月, 2015 1 次提交
    • L
      util: clean up #includes of virnetdevopenvswitch.h · 451547a4
      Laine Stump 提交于
      virnetdevopenvswitch.h declares a few functions that can be called to
      add ports to and remove them from OVS bridges, and retrieve the
      migration data for a port. It does not contain any data definitions
      that are used by domain_conf.h. But for some reason, domain_conf.h
      virnetdevopenvswitch.h should be directly #including it. This adds a
      few lines to the project, but saves all the files that don't need it
      from the extra computing, and makes the dependencies more clear cut.
      451547a4
  9. 18 3月, 2015 2 次提交
  10. 17 3月, 2015 1 次提交
  11. 16 3月, 2015 5 次提交
    • J
      Convert virDomainVcpuPinFindByVcpu into virDomainPinFindByVcpu · a8a89270
      John Ferlan 提交于
      Since both Vcpu and IOThreads code use the same API's, alter the naming
      of the API's to remove the "Vcpu" specific reference
      a8a89270
    • J
      Convert virDomainVcpuPinDefPtr to virDomainPinDefPtr · 59ba7023
      John Ferlan 提交于
      As pointed out by jtomko in his review of the IOThreads pinning code:
      
      http://www.redhat.com/archives/libvir-list/2015-March/msg00495.html
      
      there are some comments sprinkled in indicating IOThreads were using
      the same structure as the VcpuPin code...
      
      This is the first patch of a few that will change the virDomainVcpuPin*
      structures and code to just virDomainPin* - starting with the data
      structure naming...
      59ba7023
    • P
      conf: Replace access to def->mem.max_balloon with accessor functions · 4f9907cd
      Peter Krempa 提交于
      As there are two possible approaches to define a domain's memory size -
      one used with legacy, non-NUMA VMs configured in the <memory> element
      and per-node based approach on NUMA machines - the user needs to make
      sure that both are specified correctly in the NUMA case.
      
      To avoid this burden on the user I'd like to replace the NUMA case with
      automatic totaling of the memory size. To achieve this I need to replace
      direct access to the virDomainMemtune's 'max_balloon' field with
      two separate getters depending on the desired size.
      
      The two sizes are needed as:
      1) Startup memory size doesn't include memory modules in some
      hypervisors.
      2) After startup these count as the usable memory size.
      
      Note that the comments for the functions are future aware and document
      state that will be present after a few later patches.
      4f9907cd
    • P
      qemu: event: Don't fiddle with disk backing trees without a job · 1a92c719
      Peter Krempa 提交于
      Surprisingly we did not grab a VM job when a block job finished and we'd
      happily rewrite the backing chain data. This made it possible to crash
      libvirt when queueing two backing chains tightly and other badness.
      
      To fix it, add yet another handler to the helper thread that handles
      monitor events that require a job.
      1a92c719
    • P
      5c634730
  12. 03 3月, 2015 4 次提交
  13. 21 2月, 2015 1 次提交
  14. 20 2月, 2015 1 次提交
  15. 19 2月, 2015 2 次提交
    • M
      qemuProcessHandleBlockJob: Take status into account · 76c61cdc
      Michal Privoznik 提交于
      Upon BLOCK_JOB_COMPLETED event delivery, we check if the job has
      completed (in qemuMonitorJSONHandleBlockJobImpl()). For better image,
      the event looks something like this:
      
      "timestamp": {"seconds": 1423582694, "microseconds": 372666}, "event":
      "BLOCK_JOB_COMPLETED", "data": {"device": "drive-virtio-disk0", "len":
      8412790784, "offset": 409993216, "speed": 8796093022207, "type":
      "mirror", "error": "No space left on device"}}
      
      If "len" does not equal "offset" it's considered an error, and we can
      clearly see "error" field filled in. However, later in the event
      processing this case was handled no differently to case of job being
      aborted via separate API. It's time that we start differentiate these
      two because of the future work.
      Signed-off-by: NMichal Privoznik <mprivozn@redhat.com>
      76c61cdc
    • M
      qemuProcessHandleBlockJob: Set disk->mirrorState more often · c37943a0
      Michal Privoznik 提交于
      Currently, upon BLOCK_JOB_* event, disk->mirrorState is not updated
      each time. The callback code handling the events checks if a blockjob
      was started via our public APIs prior to setting the mirrorState.
      However, some block jobs may be started internally (e.g. during
      storage migration), in which case we don't bother with setting
      disk->mirror (there's nothing we can set it to anyway), or other
      fields. But it will come handy if we update the mirrorState in these
      cases too. The event wasn't delivered just for fun - we've started the
      job after all.
      
      So, in this commit, the mirrorState is set to whatever job status
      we've obtained. Of course, there are some actions on some statuses
      that we want to perform. But instead of if {} else if {} else {} ...
      enumeration, let's move to switch().
      Signed-off-by: NMichal Privoznik <mprivozn@redhat.com>
      c37943a0
  16. 13 2月, 2015 1 次提交
  17. 12 2月, 2015 2 次提交
    • D
      qemu: fix setting of VM CPU affinity with TCG · a103bb10
      Daniel P. Berrange 提交于
      If a previous commit I fixed the incorrect handling of vcpu pids
      for TCG mode QEMU:
      
        commit b07f3d82
        Author: Daniel P. Berrange <berrange@redhat.com>
        Date:   Thu Dec 18 16:34:39 2014 +0000
      
          Don't setup fake CPU pids for old QEMU
      
          The code assumes that def->vcpus == nvcpupids, so when we setup
          fake CPU pids for old QEMU with nvcpupids == 1, we cause the
          later code to read off the end of the array. This has fun results
          like sche_setaffinity(0, ...) which changes libvirtd's own CPU
          affinity, or even better sched_setaffinity($RANDOM, ...) which
          changes the affinity of a random OS process.
      
      The intent was that this would merely disable the ability to set
      per-vCPU affinity. It should still have been possible to set VM
      level host CPU affinity.
      
      Unfortunately, when you set  <vcpu cpuset='0-1'>4</vcpu>, the XML
      parser will internally take this & initialize an entry in the
      def->cputune.vcpupin array for every VCPU. IOW this is implicitly
      being treated as
      
        <cputune>
          <vcpupin cpuset='0-1' vcpu='0'/>
          <vcpupin cpuset='0-1' vcpu='1'/>
          <vcpupin cpuset='0-1' vcpu='2'/>
          <vcpupin cpuset='0-1' vcpu='3'/>
        </cputune>
      
      Even more fun, the faked cputune elements are hidden from view when
      querying the live XML, because their cpuset mask is the same as the
      VM default cpumask.
      
      The upshot was that it was impossible to set VM level CPU affinity.
      
      To fix this we must update qemuProcessSetVcpuAffinities so that it
      only reports a fatal error if the per-VCPU cpu mask is different
      from the VM level cpu mask.
      Signed-off-by: NDaniel P. Berrange <berrange@redhat.com>
      a103bb10
    • M
  18. 06 2月, 2015 1 次提交
  19. 27 1月, 2015 2 次提交
  20. 19 1月, 2015 2 次提交
  21. 15 1月, 2015 1 次提交
    • J
      Fix vmdef usage while in monitor in qemu process · c749eda4
      Ján Tomko 提交于
      Make local copy of the disk alias in qemuProcessInitPasswords,
      instead of referencing the one in domain definition, which
      might get freed if the domain crashes while we're in monitor.
      
      Also copy the memballoon period value.
      c749eda4
  22. 14 1月, 2015 1 次提交
    • P
      qemu_process: detect updated video ram size values from QEMU · ce745914
      Pavel Hrdina 提交于
      QEMU internally updates the size of video memory if the domain XML had
      provided too low memory size or there are some dependencies for a QXL
      devices 'vgamem' and 'ram' size. We need to know about the changes and
      store them into the status XML to not break migration or managedsave
      through different libvirt versions.
      
      The values would be loaded only if the "vgamem_mb" property exists for
      the device.  The presence of the "vgamem_mb" also tells that the
      "ram_size" and "vram_size" exists for QXL devices.
      Signed-off-by: NPavel Hrdina <phrdina@redhat.com>
      ce745914
  23. 21 12月, 2014 1 次提交
    • M
      qemu: completely rework reference counting · 540c339a
      Martin Kletzander 提交于
      There is one problem that causes various errors in the daemon.  When
      domain is waiting for a job, it is unlocked while waiting on the
      condition.  However, if that domain is for example transient and being
      removed in another API (e.g. cancelling incoming migration), it get's
      unref'd.  If the first call, that was waiting, fails to get the job, it
      unref's the domain object, and because it was the last reference, it
      causes clearing of the whole domain object.  However, when finishing the
      call, the domain must be unlocked, but there is no way for the API to
      know whether it was cleaned or not (unless there is some ugly temporary
      variable, but let's scratch that).
      
      The root cause is that our APIs don't ref the objects they are using and
      all use the implicit reference that the object has when it is in the
      domain list.  That reference can be removed when the API is waiting for
      a job.  And because each domain doesn't do its ref'ing, it results in
      the ugly checking of the return value of virObjectUnref() that we have
      everywhere.
      
      This patch changes qemuDomObjFromDomain() to ref the domain (using
      virDomainObjListFindByUUIDRef()) and adds qemuDomObjEndAPI() which
      should be the only function in which the return value of
      virObjectUnref() is checked.  This makes all reference counting
      deterministic and makes the code a bit clearer.
      Signed-off-by: NMartin Kletzander <mkletzan@redhat.com>
      540c339a
  24. 19 12月, 2014 2 次提交
    • D
      disable vCPU pinning with TCG mode · 65686e5a
      Daniel P. Berrange 提交于
      Although QMP returns info about vCPU threads in TCG mode, the
      data it returns is mostly lies. Only the first vCPU has a valid
      thread_id returned. The thread_id given for the other vCPUs is
      in fact the main emulator thread. All vCPUs actually run under
      the same thread in TCG mode.
      
      Our vCPU pinning code is not at all able to cope with this
      so if you try to set CPU affinity per-vCPU you end up with
      wierd errors
      
      error: Failed to start domain instance-00000007
      error: cannot set CPU affinity on process 24365: Invalid argument
      
      Since few people will care about the performance of TCG with
      strict CPU pinning, lets just disable that for now, so we get
      a clear error message
      
      error: Failed to start domain instance-00000007
      error: Requested operation is not valid: cpu affinity is not supported
      65686e5a
    • D
      Don't setup fake CPU pids for old QEMU · b07f3d82
      Daniel P. Berrange 提交于
      The code assumes that def->vcpus == nvcpupids, so when we setup
      fake CPU pids for old QEMU with nvcpupids == 1, we cause the
      later code to read off the end of the array. This has fun results
      like sche_setaffinity(0, ...) which changes libvirtd's own CPU
      affinity, or even better sched_setaffinity($RANDOM, ...) which
      changes the affinity of a random OS process.
      b07f3d82