• R
    main-loop: Acquire main_context lock around os_host_main_loop_wait. · ecbddbb1
    Richard W.M. Jones 提交于
    When running virt-rescue the serial console hangs from time to time.
    Virt-rescue runs an ordinary Linux kernel "appliance", but there is
    only a single idle process running inside, so the qemu main loop is
    largely idle.  With virt-rescue >= 1.37 you may be able to observe the
    hang by doing:
    
      $ virt-rescue -e ^] --scratch
      ><rescue> while true; do ls -l /usr/bin; done
    
    The hang in virt-rescue can be resolved by pressing a key on the
    serial console.
    
    Possibly with the same root cause, we also observed hangs during very
    early boot of regular Linux VMs with a serial console.  Those hangs
    are extremely rare, but you may be able to observe them by running
    this command on baremetal for a sufficiently long time:
    
      $ while libguestfs-test-tool -t 60 >& /tmp/log ; do echo -n . ; done
    
    (Check in /tmp/log that the failure was caused by a hang during early
    boot, and not some other reason)
    
    During investigation of this bug, Paolo Bonzini wrote:
    
    > glib is expecting QEMU to use g_main_context_acquire around accesses to
    > GMainContext.  However QEMU is not doing that, instead it is taking its
    > own mutex.  So we should add g_main_context_acquire and
    > g_main_context_release in the two implementations of
    > os_host_main_loop_wait; these should undo the effect of Frediano's
    > glib patch.
    
    This patch exactly implements Paolo's suggestion in that paragraph.
    
    This fixes the serial console hang in my testing, across 3 different
    physical machines (AMD, Intel Core i7 and Intel Xeon), over many hours
    of automated testing.  I wasn't able to reproduce the early boot hangs
    (but as noted above, these are extremely rare in any case).
    
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1435432Reported-by: NRichard W.M. Jones <rjones@redhat.com>
    Tested-by: NRichard W.M. Jones <rjones@redhat.com>
    Signed-off-by: NRichard W.M. Jones <rjones@redhat.com>
    Message-Id: <20170331205133.23906-1-rjones@redhat.com>
    [Paolo: this is actually a glib bug: recent glib versions are also
    expecting g_main_context_acquire around g_poll---but that is not
    documented and probably not even intended].
    Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
    ecbddbb1
main-loop.c 14.1 KB