• S
    iothread: fix iothread_stop() race condition · 2362a28e
    Stefan Hajnoczi 提交于
    There is a small chance that iothread_stop() hangs as follows:
    
      Thread 3 (Thread 0x7f63eba5f700 (LWP 16105)):
      #0  0x00007f64012c09b6 in ppoll () at /lib64/libc.so.6
      #1  0x000055959992eac9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimized out>, __fds=<optimized out>) at /usr/include/bits/poll2.h:77
      #2  0x000055959992eac9 in qemu_poll_ns (fds=<optimized out>, nfds=<optimized out>, timeout=<optimized out>) at util/qemu-timer.c:322
      #3  0x0000559599930711 in aio_poll (ctx=0x55959bdb83c0, blocking=blocking@entry=true) at util/aio-posix.c:629
      #4  0x00005595996806fe in iothread_run (opaque=0x55959bd78400) at iothread.c:59
      #5  0x00007f640159f609 in start_thread () at /lib64/libpthread.so.0
      #6  0x00007f64012cce6f in clone () at /lib64/libc.so.6
    
      Thread 1 (Thread 0x7f640b45b280 (LWP 16103)):
      #0  0x00007f64015a0b6d in pthread_join () at /lib64/libpthread.so.0
      #1  0x00005595999332ef in qemu_thread_join (thread=<optimized out>) at util/qemu-thread-posix.c:547
      #2  0x00005595996808ae in iothread_stop (iothread=<optimized out>) at iothread.c:91
      #3  0x000055959968094d in iothread_stop_iter (object=<optimized out>, opaque=<optimized out>) at iothread.c:102
      #4  0x0000559599857d97 in do_object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0, recurse=recurse@entry=false) at qom/object.c:852
      #5  0x0000559599859477 in object_child_foreach (obj=obj@entry=0x55959bdb8100, fn=fn@entry=0x559599680930 <iothread_stop_iter>, opaque=opaque@entry=0x0) at qom/object.c:867
      #6  0x0000559599680a6e in iothread_stop_all () at iothread.c:341
      #7  0x000055959955b1d5 in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4913
    
    The relevant code from iothread_run() is:
    
      while (!atomic_read(&iothread->stopping)) {
          aio_poll(iothread->ctx, true);
    
    and iothread_stop():
    
      iothread->stopping = true;
      aio_notify(iothread->ctx);
      ...
      qemu_thread_join(&iothread->thread);
    
    The following scenario can occur:
    
    1. IOThread:
      while (!atomic_read(&iothread->stopping)) -> stopping=false
    
    2. Main loop:
      iothread->stopping = true;
      aio_notify(iothread->ctx);
    
    3. IOThread:
      aio_poll(iothread->ctx, true); -> hang
    
    The bug is explained by the AioContext->notify_me doc comments:
    
      "If this field is 0, everything (file descriptors, bottom halves,
      timers) will be re-evaluated before the next blocking poll(), thus the
      event_notifier_set call can be skipped."
    
    The problem is that "everything" does not include checking
    iothread->stopping.  This means iothread_run() will block in aio_poll()
    if aio_notify() was called just before aio_poll().
    
    This patch fixes the hang by replacing aio_notify() with
    aio_bh_schedule_oneshot().  This makes aio_poll() or g_main_loop_run()
    to return.
    
    Implementing this properly required a new bool running flag.  The new
    flag prevents races that are tricky if we try to use iothread->stopping.
    Now iothread->stopping is purely for iothread_stop() and
    iothread->running is purely for the iothread_run() thread.
    Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
    Reviewed-by: NEric Blake <eblake@redhat.com>
    Message-id: 20171207201320.19284-6-stefanha@redhat.com
    Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
    2362a28e
iothread.c 10.7 KB