qemu_hotplug: Fix a rare race condition when detaching a device twice
https://bugzilla.redhat.com/show_bug.cgi?id=1623389 If a device is detached twice from the same domain the following race condition may happen: 1) The first DetachDevice() call will issue "device_del" on qemu monitor, but since the DEVICE_DELETED event did not arrive in time, the API ends claiming "Device detach request sent successfully". 2) The second DetachDevice() therefore still find the device in the domain and thus proceeds to detaching it again. It calls EnterMonitor() and qemuMonitorSend() trying to issue "device_del" command again. This gets both domain lock and monitor lock released. 3) At this point, qemu sends us the DEVICE_DELETED event which is going to be handled by the event loop which ends up calling qemuDomainSignalDeviceRemoval() to determine who is going to remove the device from domain definition. Whether it is the caller that marked the device for removal or whether it is going to be the event processing thread. 4) Because the device was marked for removal, qemuDomainSignalDeviceRemoval() returns true, which means the event is to be processed by the thread that has marked the device for removal (and is currently still trying to issue "device_del" command) 5) The thread finally issues the "device_del" command, which fails (obviously) and therefore it calls qemuDomainResetDeviceRemoval() to reset the device marking and quits immediately after, NOT removing any device from the domain definition. At this point, the device is still present in the domain definition but doesn't exist in qemu anymore. Worse, there is no way to remove it from the domain definition. Solution is to note down that we've seen the event and if the second "device_del" fails, not take it as a failure but carry on with the usual execution. Signed-off-by: NMichal Privoznik <mprivozn@redhat.com> ACKed-by: NPeter Krempa <pkrempa@redhat.com>
Showing
想要评论请 注册 或 登录