• L
    qemu: drop driver lock while trying to terminate qemu process · 595e26c0
    Laine Stump 提交于
    This patch is based on an earlier patch by Eric Blake which was never
    committed:
    
    https://www.redhat.com/archives/libvir-list/2011-November/msg00243.html
    
    Aside from rebasing, this patch only drops the driver lock once (prior
    to the first time the function sleeps), then leaves it dropped until
    it returns (Eric's patch would drop and re-acquire the lock around
    each call to sleep).
    
    At the time Eric sent his patch, the response (from Dan Berrange) was
    that, while it wasn't a good thing to be holding the driver lock while
    sleeping, we really need to rethink locking wrt the driver object,
    switching to a finer-grained approach that locks individual items
    within the driver object separately to allow for greater concurrency.
    
    This is a good plan, and at the time it made sense to not apply the
    patch because there was no known bug related to the driver lock being
    held in this function.
    
    However, we now know that the length of the wait in qemuProcessKill is
    sometimes too short to allow the qemu process to fully flush its disk
    cache before SIGKILL is sent, so we need to lengthen the timeout (in
    order to improve the situation with management applications until they
    can be updated to use the new VIR_DOMAIN_DESTROY_GRACEFUL flag added
    in commit 72f8a7f1). But, if we
    lengthen the timeout, we also lengthen the amount of time that all
    other threads in libvirtd are essentially blocked from doing anything
    (since just about everything needs to acquire the driver lock, if only
    for long enough to get a pointer to a domain).
    
    The solution is to modify qemuProcessKill to drop the driver lock
    while sleeping, as proposed in Eric's patch. Then we can increase the
    timeout with a clear conscience, and thus at least lower the chances
    that someone running with existing management software will suffer the
    consequence's of qemu's disk cache not being flushed.
    
    In the meantime, we still should work on Dan's proposal to make
    locking within the driver object more fine grained.
    
    (NB: although I couldn't find any instance where qemuProcessKill() was
    called with no jobs active for the domain (or some other guarantee
    that the current thread had at least one refcount on the domain
    object), this patch still follows Eric's method of temporarily adding
    a ref prior to unlocking the domain object, because I couldn't
    convince myself 100% that this was the case.)
    595e26c0
qemu_process.h 3.5 KB