1. 03 12月, 2018 1 次提交
    • V
      mirror: fix dead-lock · d12ade57
      Vladimir Sementsov-Ogievskiy 提交于
      Let start from the beginning:
      
      Commit b9e413dd (in 2.9)
      "block: explicitly acquire aiocontext in aio callbacks that need it"
      added pairs of aio_context_acquire/release to mirror_write_complete and
      mirror_read_complete, when they were aio callbacks for blk_aio_* calls.
      
      Then, commit 2e1990b2 (in 3.0) "block/mirror: Convert to coroutines"
      dropped these blk_aio_* calls, than mirror_write_complete and
      mirror_read_complete are not callbacks more, and don't need additional
      aiocontext acquiring. Furthermore, mirror_read_complete calls
      blk_co_pwritev inside these pair of aio_context_acquire/release, which
      leads to the following dead-lock with mirror:
      
       (gdb) info thr
         Id   Target Id         Frame
         3    Thread (LWP 145412) "qemu-system-x86" syscall ()
         2    Thread (LWP 145416) "qemu-system-x86" __lll_lock_wait ()
       * 1    Thread (LWP 145411) "qemu-system-x86" __lll_lock_wait ()
      
       (gdb) bt
       #0  __lll_lock_wait ()
       #1  _L_lock_812 ()
       #2  __GI___pthread_mutex_lock
       #3  qemu_mutex_lock_impl (mutex=0x561032dce420 <qemu_global_mutex>,
           file=0x5610327d8654 "util/main-loop.c", line=236) at
           util/qemu-thread-posix.c:66
       #4  qemu_mutex_lock_iothread_impl
       #5  os_host_main_loop_wait (timeout=480116000) at util/main-loop.c:236
       #6  main_loop_wait (nonblocking=0) at util/main-loop.c:497
       #7  main_loop () at vl.c:1892
       #8  main
      
      Printing contents of qemu_global_mutex, I see that "__owner = 145416",
      so, thr1 is main loop, and now it wants BQL, which is owned by thr2.
      
       (gdb) thr 2
       (gdb) bt
       #0  __lll_lock_wait ()
       #1  _L_lock_870 ()
       #2  __GI___pthread_mutex_lock
       #3  qemu_mutex_lock_impl (mutex=0x561034d25dc0, ...
       #4  aio_context_acquire (ctx=0x561034d25d60)
       #5  dma_blk_cb
       #6  dma_blk_io
       #7  dma_blk_read
       #8  ide_dma_cb
       #9  bmdma_cmd_writeb
       #10 bmdma_write
       #11 memory_region_write_accessor
       #12 access_with_adjusted_size
       #15 flatview_write
       #16 address_space_write
       #17 address_space_rw
       #18 kvm_handle_io
       #19 kvm_cpu_exec
       #20 qemu_kvm_cpu_thread_fn
       #21 qemu_thread_start
       #22 start_thread
       #23 clone ()
      
      Printing mutex in fr 2, I see "__owner = 145411", so thr2 wants aio
      context mutex, which is owned by thr1. Classic dead-lock.
      
      Then, let's check that aio context is hold by mirror coroutine: just
      print coroutine stack of first tracked request in mirror job target:
      
       (gdb) [...]
       (gdb) qemu coroutine 0x561035dd0860
       #0  qemu_coroutine_switch
       #1  qemu_coroutine_yield
       #2  qemu_co_mutex_lock_slowpath
       #3  qemu_co_mutex_lock
       #4  qcow2_co_pwritev
       #5  bdrv_driver_pwritev
       #6  bdrv_aligned_pwritev
       #7  bdrv_co_pwritev
       #8  blk_co_pwritev
       #9  mirror_read_complete () at block/mirror.c:232
       #10 mirror_co_read () at block/mirror.c:370
       #11 coroutine_trampoline
       #12 __start_context
      
      Yes it is mirror_read_complete calling blk_co_pwritev after acquiring
      aio context.
      Signed-off-by: NVladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      d12ade57
  2. 25 9月, 2018 3 次提交
  3. 31 8月, 2018 3 次提交
  4. 15 8月, 2018 2 次提交
  5. 10 7月, 2018 1 次提交
    • F
      block: Use BdrvChild to discard · 0b9fd3f4
      Fam Zheng 提交于
      Other I/O functions are already using a BdrvChild pointer in the API, so
      make discard do the same. It makes it possible to initiate the same
      permission checks before doing I/O, and much easier to share the
      helper functions for this, which will be added and used by write,
      truncate and copy range paths.
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      0b9fd3f4
  6. 18 6月, 2018 9 次提交
  7. 30 5月, 2018 1 次提交
    • K
      job: Add error message for failing jobs · 1266c9b9
      Kevin Wolf 提交于
      So far we relied on job->ret and strerror() to produce an error message
      for failed jobs. Not surprisingly, this tends to result in completely
      useless messages.
      
      This adds a Job.error field that can contain an error string for a
      failing job, and a parameter to job_completed() that sets the field. As
      a default, if NULL is passed, we continue to use strerror(job->ret).
      
      All existing callers are changed to pass NULL. They can be improved in
      separate patches.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      1266c9b9
  8. 23 5月, 2018 18 次提交
  9. 15 5月, 2018 2 次提交