- 28 4月, 2015 40 次提交
-
-
由 Daniel P. Berrange 提交于
The PortNumber data type is declared to derive from 'short'. Unfortunately this is an signed type, so validates the range [-32,768, 32,767] which excludes valid port numbers between 32767 and 65535. We can't use 'unsignedShort', since we need -1 to be a valid port number too. This change is to use 'int' and set an explicit max boundary instead of relying on the data types' built-in max. One of the existing tests is changed to use a high port number to validate the schema. https://bugzilla.redhat.com/show_bug.cgi?id=1214664Signed-off-by: NDaniel P. Berrange <berrange@redhat.com> (cherry picked from commit 615bdfda)
-
由 zhang bo 提交于
just as what b8e25c35 did, we fall back to the ACPI method when the guest agent is unresponsive in qemuDomainReboot(). Signed-off-by: NYueWenyuan <yuewenyuan@huawei.com> Signed-off-by: NZhang Bo <oscar.zhangbo@huawei.com> Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> (cherry picked from commit eadf41fe)
-
由 Roman Bogorodskiy 提交于
When running on FreeBSD, there's a bug in virCommandProcessIO polling that is triggered by the commandtest. A test that triggers EPIPE in commandtest (named "test20") hungs forever on FreeBSD. Apparently, this happens because FreeBSD sets POLLHUP flag on revents when stdin in closed. And as the current implementation only checks for POLLOUT and POLLERR, it ends up looping forever inside virCommandProcessIO and not trying to do one more write() that would trigger EPIPE. To fix that check for the POLLHUP flag along with POLLOUT and POLLERR. (cherry picked from commit e34cccf7)
-
由 Peter Krempa 提交于
Some storage protocols allow to have the @path field in struct virStorageSource set to NULL. Add NULLSTR() wrappers to handle this possibility until I finish the storage source error formatter. (cherry picked from commit 62a61d58)
-
由 Laine Stump 提交于
A further fix for: https://bugzilla.redhat.com/show_bug.cgi?id=1113474 Since there is no possibility that any type of macvtap will work if the parent physdev it's attached to is offline, we should bring the physdev online at the same time as the macvtap. When taking the macvtap offline, it's also necessary to take the physdev offline for macvtap passthrough mode (because the physdev has the same MAC address as the macvtap device, so could potentially cause problems with misdirected packets during migration, as outlined in commits 829770 and 879c13). We can't set the physdev offline for other macvtap modes 1) because there may be other macvtap devices attached to the same physdev (and/or the host itself may be using the device) in the other modes whereas passthrough mode is exclusive to one macvtap at a time, and 2) there's no practical reason to do so anyway. (cherry picked from commit 38172ed8)
-
由 Laine Stump 提交于
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1113474 When we set the MAC address of a network device as a part of setting up macvtap "passthrough" mode (where the domain has an emulated netdev connected to a host macvtap device that has exclusive use of the physical device, and sets the device MAC address to match its own, i.e. "<interface type='direct'> <source mode='passthrough' .../>"), we use ioctl(SIOCSIFHWADDR) giving it the name of that device. This is true even if it is an SRIOV Virtual Function (VF). But, when we are setting the MAC address / vlan ID of a VF in preparation for "hostdev network" passthrough (this is where we set the MAC address and vlan id of the VF after detaching the host net driver and before assigning the device to the domain with PCI passthrough, i.e. "<interface type='hostdev'>", we do the setting via a netlink RTM_SETLINK message for that VF's Physical Function (PF), telling it the VF# we want to change. This sets an "administratively changed MAC" flag for that VF in the PF's driver, and from that point on (until the PF driver is reloaded, *not* merely the VF driver) that VF's MAC address can't be changed using ioctl(SIOCSIFHWADDR) - the only way to change it is via the PF with RTM_SETLINK. This means that if a VF is used for hostdev passthrough, it will have the admin flag set, and future attempts to use that VF for macvtap passthrough will fail. The solution to this problem is to check if the device being used for macvtap passthrough is actually a VF; if so, we use the netlink RTM_SETLINK message to the PF to set the VF's mac address instead of ioctl(SIOCSIFHWADDR) directly to the VF; if not, behavior does not change from previously. There are three pieces to making this work: 1) virNetDevMacVLan(Create|Delete)WithVPortProfile() now call virNetDev(Replace|Restore)NetConfig() rather than virNetDev(Replace|Restore)MacAddress() (simply passing -1 for VF# and vlanid). 2) virNetDev(Replace|Restore)NetConfig() check to see if the device is a VF. If so, they find the PF's name and VF#, allowing them to call virNetDev(Replace|Restore)VfConfig(). 3) To prevent mixups when detaching a macvtap passthrough device that had been attached while running an older version of libvirt, virNetDevRestoreVfConfig() is potentially given the preserved name of the VF, and if the proper statefile for a VF can't be found in the stateDir (${stateDir}/${pfname}_vf${vfid}), virNetDevRestoreMacAddress() is called instead (which will look in the file named ${stateDir}/${vfname}). This problem has existed in every version of libvirt that has both macvtap passthrough and interface type='hostdev'. Fortunately people seem to use one or the other though, so it hasn't caused any real world problem reports. (cherry picked from commit cb3fe38c)
-
由 Richard W.M. Jones 提交于
Commit 70f44663 (from 2008) introduced some functions for testing whether xend was returning correct sound models. Those functions have long gone, but the function prototypes remain. This commit removes the unused prototypes. Signed-off-by: NRichard W.M. Jones <rjones@redhat.com> (cherry picked from commit 093eea95)
-
由 zhang bo 提交于
When a qemu domain is to be rebooted, from outside, at libvirt level it looks like regular shutdown. To really restart the domain, libvirt needs to issue reset command on the monitor once SHUTDOWN event appeared. So, in order to differentiate bare shutdown and reboot libvirt uses a variable within domain private data. It's called fakeReboot. When the reboot API is called, the variable is set, but when the shutdown API is called it must be cleared out. But it was not for every possible case. So if user called virDomainReboot(), and there was no ACPI daemon running inside the guest (so guest didn't initiated shutdown sequence) and then virDomainShutdown(mode=agent) was called bad thing happened. We remembered the fakeReboot and instead of shutting the domain down, we just rebooted it. Signed-off-by: NZhang Bo <oscar.zhangbo@huawei.com> Signed-off-by: NWang Yufei <james.wangyufei@huawei.com> Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> (cherry picked from commit 8be502fd)
-
由 Michal Privoznik 提交于
https://bugzilla.redhat.com/show_bug.cgi?id=1211436 This reverts commit b7829f95. The previous fix was not correct. Like everywhere else, a driver is a global variable allocated in stateInitialize function (or something similar for stateless drivers). Later, when a driver API is called, it's possible that the global variable is accessed and dereferenced. Now, some drivers require root privileges because they undertake some actions reserved only for the system admin (e.g. manipulating host firewall). And here's the trouble, the NWFilter state initializer exited too early when finding out it's running unprivileged, leaving the global NWFilter driver variable uninitialized. Any subsequent API call that tried to lock the driver resulted in dereferencing the driver and thus crash. On the other hand, in order to not resurrect the bug the original commit was fixing, Let's forbid the nwfilter define in session mode. Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> Conflicts: src/nwfilter/nwfilter_driver.c: Context. Code changed a bit since 2013. (cherry picked from commit 77d92e2e)
-
由 Michal Privoznik 提交于
There is a possibility that we jump onto error label with @lockpath still initialized to NULL. Here, the @lockpath should be unlink()-ed, but passing there a NULL is not a good idea. Don't do that. In fact, we should call unlink() only if we created the lock file successfully. Reported-by: NJohn Ferlan <jferlan@redhat.com> Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> (cherry picked from commit 1fdac3d9)
-
由 Jiri Denemark 提交于
When acquiring resource via sanlock fails, we would report it as VIR_ERR_INTERNAL_ERROR, which is not very friendly to applications using libvirt. Moreover, the lockd driver would report the same failure as VIR_ERR_RESOURCE_BUSY, which looks better. Unfortunately, in sanlock driver we don't really know if acquiring the resource failed because it was already locked or there was another reason behind. But the end result is the same and I think using VIR_ERR_RESOURCE_BUSY reason for all acquire failures is still better than what we have now. https://bugzilla.redhat.com/show_bug.cgi?id=1165119Signed-off-by: NJiri Denemark <jdenemar@redhat.com> (cherry picked from commit 4864e377)
-
由 Michal Privoznik 提交于
When pre-creating storage for domains, we need to find corresponding disk in the XML on the destination (domain XML may differ there, e.g. disk is accessible under different path). For better debugging, I'm printing all info I received on a disk. But there was a typo when printing the disk capacity: "%lluu" instead of "%llu". Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> (cherry picked from commit 65a88572)
-
由 Xing Lin 提交于
The problem with the previous implementation is, even when qemuMigrationUpdateJobStatus() detects a migration job has completed, it will do a sleep for 50 ms (which is unnecessary and only adds up to the VM pause time). Signed-off-by: NXing Lin <xinglin@cs.utah.edu> Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> (cherry picked from commit 522e81cb)
-
由 Michael Chapman 提交于
Currently we check qemuCaps before starting the block job. But qemuCaps isn't available on a stopped domain, which means we get a misleading error message in this case: # virsh domstate example shut off # virsh blockjob example vda error: unsupported configuration: block jobs not supported with this QEMU binary Move the qemuCaps check into the block job so that we are guaranteed the domain is running. Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit cfcdf5ff)
-
由 Michael Chapman 提交于
qemuMigrationCookieAddNBD is usually called from within an async MIGRATION_OUT or MIGRATION_IN job, so it needs to start a nested job. (The one exception is during the Begin phase when change protection isn't enabled, but qemuDomainObjEnterMonitorAsync will behave the same as qemuDomainObjEnterMonitor in this case.) This bug was encountered with a libvirt client that repeatedly queries the disk mirroring block job info during a migration. If one of these queries occurs just as the Perform migration cookie is baked, libvirt crashes. Relevant logs are as follows: 6701: warning : qemuDomainObjEnterMonitorInternal:1544 : This thread seems to be the async job owner; entering monitor without asking for a nested job is dangerous [1] 6701: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block","id":"libvirt-629"} [2] 6699: info : qemuMonitorIOWrite:503 : QEMU_MONITOR_IO_WRITE: mon=0x7fefdc004700 buf={"execute":"query-block","id":"libvirt-629"} [3] 6704: info : qemuMonitorSend:972 : QEMU_MONITOR_SEND_MSG: mon=0x7fefdc004700 msg={"execute":"query-block-jobs","id":"libvirt-630"} [4] 6699: info : qemuMonitorJSONIOProcessLine:203 : QEMU_MONITOR_RECV_REPLY: mon=0x7fefdc004700 reply={"return": [...], "id": "libvirt-629"} 6699: error : qemuMonitorJSONIOProcessLine:211 : internal error: Unexpected JSON reply '{"return": [...], "id": "libvirt-629"}' At [1] qemuMonitorBlockStatsUpdateCapacity sends its request, then waits on mon->notify. At [2] the request is written out to the monitor socket. At [3] qemuMonitorBlockJobInfo sends its request, and also waits on mon->notify. The reply from the first request is received at [4]. However, qemuMonitorJSONIOProcessLine is not expecting this reply since the second request hadn't completed sending. The reply is dropped and an error is returned. qemuMonitorIO signals mon->notify twice during its error handling, waking up both of the threads waiting on it. One of them clears mon->msg as it exits qemuMonitorSend; the other crashes: qemuMonitorSend (mon=0x7fefdc004700, msg=<value optimized out>) at qemu/qemu_monitor.c:975 975 while (!mon->msg->finished) { (gdb) print mon->msg $1 = (qemuMonitorMessagePtr) 0x0 Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit 72df8314)
-
由 Michael Chapman 提交于
The close callbacks hash are keyed by a UUID-string, but virCloseCallbacksRun was attempting to remove them by raw UUID. This patch ensures the callback entries are removed by UUID-string as well. This bug caused problems when guest migrations were abnormally aborted: # timeout --signal KILL 1 \ virsh migrate example qemu+tls://remote/system \ --verbose --compressed --live --auto-converge \ --abort-on-error --unsafe --persistent \ --undefinesource --copy-storage-all --xml example.xml Killed # virsh migrate example qemu+tls://remote/system \ --verbose --compressed --live --auto-converge \ --abort-on-error --unsafe --persistent \ --undefinesource --copy-storage-all --xml example.xml error: Requested operation is not valid: domain 'example' is not being migrated Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit fa2607d5)
-
由 Michael Chapman 提交于
If a VM migration is aborted, a disk mirror may be failed by QEMU before libvirt has a chance to cancel it. The disk->mirrorState remains at _ABORT in this case, and this breaks subsequent mirrorings of that disk. We should instead check the mirrorState directly and transition to _NONE if it is already aborted. Do the check *after* aborting the block job in QEMU to avoid a race. Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit e5d729ba)
-
由 Michael Chapman 提交于
If virCloseCallbacksSet fails, qemuMigrationBegin must return NULL to indicate an error occurred. Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit 77ddd0bb)
-
由 Michael Chapman 提交于
The destination libvirt daemon in a migration may segfault if the client disconnects immediately after the migration has begun: # virsh -c qemu+tls://remote/system list --all Id Name State ---------------------------------------------------- ... # timeout --signal KILL 1 \ virsh migrate example qemu+tls://remote/system \ --verbose --compressed --live --auto-converge \ --abort-on-error --unsafe --persistent \ --undefinesource --copy-storage-all --xml example.xml Killed # virsh -c qemu+tls://remote/system list --all error: failed to connect to the hypervisor error: unable to connect to server at 'remote:16514': Connection refused The crash is in: 1531 void 1532 qemuDomainObjEndJob(virQEMUDriverPtr driver, virDomainObjPtr obj) 1533 { 1534 qemuDomainObjPrivatePtr priv = obj->privateData; 1535 qemuDomainJob job = priv->job.active; 1536 1537 priv->jobs_queued--; Backtrace: #0 at qemuDomainObjEndJob at qemu/qemu_domain.c:1537 #1 in qemuDomainRemoveInactive at qemu/qemu_domain.c:2497 #2 in qemuProcessAutoDestroy at qemu/qemu_process.c:5646 #3 in virCloseCallbacksRun at util/virclosecallbacks.c:350 #4 in qemuConnectClose at qemu/qemu_driver.c:1154 ... qemuDomainRemoveInactive calls virDomainObjListRemove, which in this case is holding the last remaining reference to the domain. qemuDomainRemoveInactive then calls qemuDomainObjEndJob, but the domain object has been freed and poisoned by then. This patch bumps the domain's refcount until qemuDomainRemoveInactive has completed. We also ensure qemuProcessAutoDestroy does not return the domain to virCloseCallbacksRun to be unlocked in this case. There is similar logic in bhyveProcessAutoDestroy and lxcProcessAutoDestroy (which call virDomainObjListRemove directly). Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit 7578cc17)
-
由 Peter Krempa 提交于
The block copy API takes the speed in bytes/s rather than MiB/s that was the prior approach in virDomainBlockRebase. We correctly converted the speed to bytes/s in the old API but we still called the common helper virDomainBlockCopyCommon with the unadjusted variable. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1207122 (cherry picked from commit 3c6a72d5)
-
由 Luyao Huang 提交于
The overflow check for the bandwidth parameter did not jump to the cleanup label. Additionally virsh should use vshError instead of virReportError. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1206987Signed-off-by: NLuyao Huang <lhuang@redhat.com> (cherry picked from commit 390f218b)
-
由 Shanzhi Yu 提交于
Blockcopy to non-file destination is not supported according the code, but a 'goto endjob' is missed after checking the destination. This leads to calling drive-mirror with wrong parameters. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1206406Signed-off-by: NShanzhi Yu <shyu@redhat.com> Signed-off-by: NJán Tomko <jtomko@redhat.com> (cherry picked from commit c5fbad66)
-
由 Wei Huang 提交于
Current libvirt can only handle up to 1023 bytes when it reads Linux sysfs topology/thread_siblings. This isn't enough for Linux distributions that support a large value. This patch fixes the problem by using VIR_ALLOC()/VIR_FREE(), instead of using a fixed-size (1024) local char array. In the meanwhile SYSFS_THREAD_SIBLINGS_LIST_LENGTH_MAX is increased to 8192 which should be large enough for a foreseeable future. Signed-off-by: NWei Huang <wei@redhat.com> (cherry picked from commit c13de016)
-
由 Eric Blake 提交于
On IRC, Hydrar pointed a problem where 'virsh edit' failed on his domain created through an ISCSI pool managed by virt-manager, all because the XML included a block device with colons in the name. * docs/schemas/basictypes.rng (absFilePath): Add colon as safe. * tests/qemuxml2argvdata/qemuxml2argv-disk-iscsi.xml: New file. * tests/qemuxml2argvdata/qemuxml2argv-disk-iscsi.args: Likewise. * tests/qemuxml2argvtest.c (mymain): Test it. Signed-off-by: NEric Blake <eblake@redhat.com> (cherry picked from commit dfc70875)
-
由 Jiri Denemark 提交于
Because of the microcode update to Haswell/Broadwell CPUs, existing domains using these CPUs may fail to start even though they used to run just fine. To help users solve this issue we try to suggest switching to -noTSX variant of the CPU model: virsh # start cd error: Failed to start domain cd error: unsupported configuration: guest and host CPU are not compatible: Host CPU does not provide required features: rtm, hle; try using 'Haswell-noTSX' CPU model Signed-off-by: NJiri Denemark <jdenemar@redhat.com> (cherry picked from commit 53c8062f)
-
由 Amy Fong 提交于
In some circumstances where the build tree differs from the source, libvirt's compile will try to create the symlink for cpu_map.xml before creating the directory $(abs_builddir)/cpu: 'src/cpu/cpu_map.xml': No such file or directory' Do not create the symlink, it is no longer needed after commit e562e82f Load CPU map from builddir when run uninstalled Signed-off-by: NAmy Fong <amy.fong@windriver.com> Signed-off-by: NJán Tomko <jtomko@redhat.com> (cherry picked from commit 237ffd1b)
-
由 Guido Günther 提交于
When using QEMU's 9pfs the target "dir" element is not necessarily an absolute path but merely an arbitrary identifier. So validation in that case currently fails with the misleading $ virt-xml-validate /tmp/test.xml Relax-NG validity error : Extra element devices in interleave /tmp/test.xml:24: element devices: Relax-NG validity error : Element domain failed to validate content /tmp/test.xml fails to validate (cherry picked from commit db1edae8)
-
由 Ján Tomko 提交于
Commit bab2eda6 changed the behavior for missing compat attribute, but failed to update the documentation. Before, the option was omitted from qemu-img command line and the qemu-img default was used. Now we always specify the compat value and the default is 0.10. Reported by Christophe Fergeau https://bugzilla.gnome.org/show_bug.cgi?id=746660#c4 (cherry picked from commit 7c8ae42d)
-
由 Laine Stump 提交于
While debugging the support for responding to qemu RX_FILTER_CHANGED events, I had changed the "ignoring this event" log message from VIR_DEBUG to VIR_WARN, but forgot to change it back before pushing. Since many guest OSes make enough changes to multicast lists and/or promiscuous mode settings to trigger this message, it's starting to show up as a red herring in bug reports. (cherry picked from commit dae3e246)
-
由 Luyao Huang 提交于
Commit 5bba61fd changed the XPath strings to be absolute when parsing the VM NUMA configuration. Unfortunately the <domain> element is not a top level element when parsing the domain status XML thus the absolute XPath string doesn't match. Use the relative string so that the <numa> settings are not lost. Signed-off-by: NLuyao Huang <lhuang@redhat.com> (cherry picked from commit d75e23bb)
-
由 Michael Chapman 提交于
Commit cf54c606 introduced the ability to create missing storage volumes during migration. For network disks, however, we may not necessarily be able to detect whether they already exist -- there is no straight-forward way to map the disk to a storage volume, and even if there were it's possible no configured storage pool actually contains the disk. It is better to assume the network disk exists in this case, rather than aborting the migration completely. If the volume really is missing, QEMU will generate an appropriate error later in the migration. Signed-off-by: NMichael Chapman <mike@very.puzzling.org> (cherry picked from commit a1b18051)
-
由 Luyao Huang 提交于
https://bugzilla.redhat.com/show_bug.cgi?id=1196934 When qemu exits during startup, libvirt includes the error from /var/log/libvirt/qemu/vm.log in the error message: $ virsh start test3 error: Failed to start domain test3 error: internal error: early end of file from monitor: possible problem: 2015-02-27T03:03:16.985494Z qemu-kvm: -numa memdev is not supported by machine rhel6.5.0 The check for domain liveness added to qemuDomainObjExitMonitor in commit dc2fd51f sometimes overwrites this error: $ virsh start test3 error: Failed to start domain test3 error: operation failed: domain is no longer running Fix the check to only report an error if there is none set. Signed-off-by: NLuyao Huang <lhuang@redhat.com> Signed-off-by: NJán Tomko <jtomko@redhat.com> (cherry picked from commit 4f068209)
-
由 Jim Fehlig 提交于
When converting domXML from native, the libxl driver was overwriting useful errors from the xenconfig parsing code with a useless, generic error. E.g. "internal error: parsing xm config failed" vs "internal error: config value usbdevice was malformed". Remove the redundant (and useless) error reporting in the libxl driver. Signed-off-by: NJim Fehlig <jfehlig@suse.com> (cherry picked from commit bd235cd8)
-
由 Laine Stump 提交于
Investigation of a problem with creating passthrough macvtap devices (https://bugzilla.redhat.com/show_bug.cgi?id=1185501) has shown that this slightly more verbose failure message is useful. In particular, the mac address can be used to determine the domain. You could also figure this out by looking at preceding messages in a debug log, but this gets it in a single place. (cherry picked from commit 72423df9)
-
由 Peter Krempa 提交于
(cherry picked from commit e7974b4f)
-
由 Peter Krempa 提交于
Only selected fields from the disk source were copied when cold updating source in a CDROM drive. When such drive was backed by a network file this resulted into corruption of the definition: <disk type='network' device='cdrom'> <driver name='qemu' type='raw' cache='none'/> <source protocol='gluster' name='gluster-vol1(null)'> <host name='localhost'/> </source> <target dev='vdc' bus='virtio'/> <readonly/> <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/> </disk> Update the whole source instead of cherry-picking elements. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1166024 (cherry picked from commit d0dc6c03)
-
由 Erik Skultety 提交于
We interpret port values as signed int (convert them from char *), so if a negative value is provided in network disk's configuration, we accept it as valid, however there's an 'unknown cause' error raised later. This error is only accidental because we return the port value in the return code. This patch adds just a minor tweak to the already existing check so we reject negative values the same way as we reject non-numerical strings. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1163553 (cherry picked from commit 84646165)
-
由 Eric Blake 提交于
Commit f182da20 (v1.2.6) caused a slight regression in virsh reporting of a non-active block job; where it used to state "Commit complete", it now states "Now in synchronized phase". But the synchronized phase is only possible for an active commit. For a reproducer, I created a chain 'a <- b <- c <- d <- e' and ran virsh blockcommit $dom vda --top c --base a --verbose --wait * tools/virsh-domain.c (cmdBlockCommit): Synchronized phase is only possible on active commits. Signed-off-by: NEric Blake <eblake@redhat.com> (cherry picked from commit ceec58ac)
-
由 zhang bo 提交于
Introduced by f6a2f97e Problem Description: After multiple times of migrating a domain, which has an ovs interface with no portData set, with non-shared disk, nbd ports got overflowed. The steps to reproduce the problem: 1 define and start a domain with its network configured as: <interface type='bridge'> <source bridge='br0'/> <virtualport type='openvswitch'> </virtualport> <model type='virtio'/> <driver name='vhost' queues='4'/> </interface> 2 do not set the network's portData. 3 migrate(ToURI2) it with flag 91(1011011), which means: VIR_MIGRATE_LIVE VIR_MIGRATE_PEER2PEER VIR_MIGRATE_PERSIST_DEST VIR_MIGRATE_UNDEFINE_SOURCE VIR_MIGRATE_NON_SHARED_DISK 4 migrate success, but we got an error log in libvirtd.log: error : virCommandWait:2423 : internal error: Child process (ovs-vsctl --timeout=5 get Interface vnet1 external_ids:PortData) unexpected exit status 1: ovs-vsctl: no key "PortData" in Interface record "vnet1" column external_ids 5 migrate it back, migrate it , migrate it back, ....... 6 nbd port got overflowed. The reasons for the problem is : 1 virNetDevOpenvswitchGetMigrateData() takes it as wrong if no portData is available for the ovs interface of a domain. (We think it's not appropriate, as portData is just OPTIONAL) 2 in func qemuMigrationBakeCookie(), it fails in qemuMigrationCookieAddNetwork(), and returns with -1. qemuMigrationCookieAddNBD() is not called thereafter, and mig->nbd is still NULL. 3 However, qemuMigrationRun() just *WARN* if qemuMigrationBakeCookie() fails, migration still successes. cookie is NULL, it's not baked on the src side. 4 On the destination side, it would alloc a port first and then free the nbd port in COOKIE. But the cookie is NULL due to qemuMigrationCookieAddNetwork() failure at src side. thus the nbd port is not freed. In this patch, we add "--if-exists" option to make ovs-vsctl not raise error if there's no portData available. Further more, because portData may be NULL in the cookie at the dest side, check it before setting portData. Signed-off-by: NZhou Yimin <zhouyimin@huawei.com> Signed-off-by: NZhang Bo <oscar.zhangbo@huawei.com> (cherry picked from commit 25df57db)
-