1. 25 9月, 2018 5 次提交
    • K
      job: Use AIO_WAIT_WHILE() in job_finish_sync() · de0fbe64
      Kevin Wolf 提交于
      job_finish_sync() needs to release the AioContext lock of the job before
      calling aio_poll(). Otherwise, callbacks called by aio_poll() would
      possibly take the lock a second time and run into a deadlock with a
      nested AIO_WAIT_WHILE() call.
      
      Also, job_drain() without aio_poll() isn't necessarily enough to make
      progress on a job, it could depend on bottom halves to be executed.
      
      Combine both open-coded while loops into a single AIO_WAIT_WHILE() call
      that solves both of these problems.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      de0fbe64
    • K
      blockjob: Wake up BDS when job becomes idle · 34dc97b9
      Kevin Wolf 提交于
      In the context of draining a BDS, the .drained_poll callback of block
      jobs is called. If this returns true (i.e. there is still some activity
      pending), the drain operation may call aio_poll() with blocking=true to
      wait for completion.
      
      As soon as the pending activity is completed and the job finally arrives
      in a quiescent state (i.e. its coroutine either yields with busy=false
      or terminates), the block job must notify the aio_poll() loop to wake
      up, otherwise we get a deadlock if both are running in different
      threads.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NFam Zheng <famz@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      34dc97b9
    • K
      job: Fix missing locking due to mismerge · d1756c78
      Kevin Wolf 提交于
      job_completed() had a problem with double locking that was recently
      fixed independently by two different commits:
      
      "job: Fix nested aio_poll() hanging in job_txn_apply"
      "jobs: add exit shim"
      
      One fix removed the first aio_context_acquire(), the other fix removed
      the other one. Now we have a bug again and the code is run without any
      locking.
      
      Add it back in one of the places.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJohn Snow <jsnow@redhat.com>
      d1756c78
    • F
      job: Fix nested aio_poll() hanging in job_txn_apply · 49880165
      Fam Zheng 提交于
      All callers have acquired ctx already. Doing that again results in
      aio_poll() hang. This fixes the problem that a BDRV_POLL_WHILE() in the
      callback cannot make progress because ctx is recursively locked, for
      example, when drive-backup finishes.
      
      There are two callers of job_finalize():
      
          fam@lemon:~/work/qemu [master]$ git grep -w -A1 '^\s*job_finalize'
          blockdev.c:    job_finalize(&job->job, errp);
          blockdev.c-    aio_context_release(aio_context);
          --
          job-qmp.c:    job_finalize(job, errp);
          job-qmp.c-    aio_context_release(aio_context);
          --
          tests/test-blockjob.c:    job_finalize(&job->job, &error_abort);
          tests/test-blockjob.c-    assert(job->job.status == JOB_STATUS_CONCLUDED);
      
      Ignoring the test, it's easy to see both callers to job_finalize (and
      job_do_finalize) have acquired the context.
      
      Cc: qemu-stable@nongnu.org
      Reported-by: NGu Nini <ngu@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NFam Zheng <famz@redhat.com>
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      49880165
    • J
      jobs: remove .exit callback · ccbfb331
      John Snow 提交于
      Now that all of the jobs use the component finalization callbacks,
      there's no use for the heavy-hammer .exit callback anymore.
      
      job_exit becomes a glorified type shim so that we can call
      job_completed from aio_bh_schedule_oneshot.
      
      Move these three functions down into job.c to eliminate a
      forward reference.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180906130225.5118-12-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      ccbfb331
  2. 31 8月, 2018 5 次提交
    • J
      jobs: remove job_defer_to_main_loop · e21a1c98
      John Snow 提交于
      Now that the job infrastructure is handling the job_completed call for
      all implemented jobs, we can remove the interface that allowed jobs to
      schedule their own completion.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180830015734.19765-10-jsnow@redhat.com
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      e21a1c98
    • J
      jobs: remove ret argument to job_completed; privatize it · 404ff28d
      John Snow 提交于
      Jobs are now expected to return their retcode on the stack, from the
      .run callback, so we can remove that argument.
      
      job_cancel does not need to set -ECANCELED because job_completed will
      update the return code itself if the job was canceled.
      
      While we're here, make job_completed static to job.c and remove it from
      job.h; move the documentation of return code to the .run() callback and
      to the job->ret property, accordingly.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20180830015734.19765-9-jsnow@redhat.com
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      404ff28d
    • J
      jobs: add exit shim · 00359a71
      John Snow 提交于
      All jobs do the same thing when they leave their running loop:
      - Store the return code in a structure
      - wait to receive this structure in the main thread
      - signal job completion via job_completed
      
      Few jobs do anything beyond exactly this. Consolidate this exit
      logic for a net reduction in SLOC.
      
      More seriously, when we utilize job_defer_to_main_loop_bh to call
      a function that calls job_completed, job_finalize_single will run
      in a context where it has recursively taken the aio_context lock,
      which can cause hangs if it puts down a reference that causes a flush.
      
      You can observe this in practice by looking at mirror_exit's careful
      placement of job_completed and bdrv_unref calls.
      
      If we centralize job exiting, we can signal job completion from outside
      of the aio_context, which should allow for job cleanup code to run with
      only one lock, which makes cleanup callbacks less tricky to write.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180830015734.19765-4-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      00359a71
    • J
      jobs: canonize Error object · 3d1f8b07
      John Snow 提交于
      Jobs presently use both an Error object in the case of the create job,
      and char strings in the case of generic errors elsewhere.
      
      Unify the two paths as just j->err, and remove the extra argument from
      job_completed. The integer error code for job_completed is kept for now,
      to be removed shortly in a separate patch.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Message-id: 20180830015734.19765-3-jsnow@redhat.com
      [mreitz: Dropped a superfluous g_strdup()]
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      3d1f8b07
    • J
      jobs: change start callback to run callback · f67432a2
      John Snow 提交于
      Presently we codify the entry point for a job as the "start" callback,
      but a more apt name would be "run" to clarify the idea that when this
      function returns we consider the job to have "finished," except for
      any cleanup which occurs in separate callbacks later.
      
      As part of this clarification, change the signature to include an error
      object and a return code. The error ptr is not yet used, and the return
      code while captured, will be overwritten by actions in the job_completed
      function.
      Signed-off-by: NJohn Snow <jsnow@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Message-id: 20180830015734.19765-2-jsnow@redhat.com
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      Signed-off-by: NMax Reitz <mreitz@redhat.com>
      f67432a2
  3. 29 8月, 2018 1 次提交
  4. 22 8月, 2018 1 次提交
  5. 18 6月, 2018 1 次提交
  6. 30 5月, 2018 1 次提交
    • K
      job: Add error message for failing jobs · 1266c9b9
      Kevin Wolf 提交于
      So far we relied on job->ret and strerror() to produce an error message
      for failed jobs. Not surprisingly, this tends to result in completely
      useless messages.
      
      This adds a Job.error field that can contain an error string for a
      failing job, and a parameter to job_completed() that sets the field. As
      a default, if NULL is passed, we continue to use strerror(job->ret).
      
      All existing callers are changed to pass NULL. They can be improved in
      separate patches.
      Signed-off-by: NKevin Wolf <kwolf@redhat.com>
      Reviewed-by: NMax Reitz <mreitz@redhat.com>
      Reviewed-by: NJeff Cody <jcody@redhat.com>
      1266c9b9
  7. 23 5月, 2018 26 次提交