THREADS.txt 11.3 KB
Newer Older
1 2 3 4 5 6
   QEMU Driver  Threading: The Rules
   =================================

This document describes how thread safety is ensured throughout
the QEMU driver. The criteria for this model are:

E
Eric Blake 已提交
7
 - Objects must never be exclusively locked for any prolonged time
8
 - Code which sleeps must be able to time out after suitable period
E
Eric Blake 已提交
9
 - Must be safe against dispatch of asynchronous events from monitor
10 11 12 13 14 15 16


Basic locking primitives
------------------------

There are a number of locks on various objects

17
  * virQEMUDriverPtr
18

19 20 21 22 23 24
    The qemu_conf.h file has inline comments describing the locking
    needs for each field. Any field marked immutable, self-locking
    can be accessed without the driver lock. For other fields there
    are typically helper APIs in qemu_conf.c that provide serialized
    access to the data. No code outside qemu_conf.c should ever
    acquire this lock
25

26
  * virDomainObjPtr
27

28 29 30
    Will be locked and the reference counter will be increased after calling
    any of the virDomainObjListFindBy{ID,Name,UUID} methods. The preferred way
    of decrementing the reference counter and unlocking the domain is using the
M
Michal Privoznik 已提交
31
    virDomainObjEndAPI() function.
32 33 34

    Lock must be held when changing/reading any variable in the virDomainObjPtr

E
Eric Blake 已提交
35
    This lock must not be held for anything which sleeps/waits (i.e. monitor
36 37 38
    commands).


39
  * qemuMonitorPrivatePtr: Job conditions
40

E
Eric Blake 已提交
41
    Since virDomainObjPtr lock must not be held during sleeps, the job
42 43
    conditions provide additional protection for code making updates.

44 45
    QEMU driver uses three kinds of job conditions: asynchronous, agent
    and normal.
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

    Asynchronous job condition is used for long running jobs (such as
    migration) that consist of several monitor commands and it is
    desirable to allow calling a limited set of other monitor commands
    while such job is running.  This allows clients to, e.g., query
    statistical data, cancel the job, or change parameters of the job.

    Normal job condition is used by all other jobs to get exclusive
    access to the monitor and also by every monitor command issued by an
    asynchronous job.  When acquiring normal job condition, the job must
    specify what kind of action it is about to take and this is checked
    against the allowed set of jobs in case an asynchronous job is
    running.  If the job is incompatible with current asynchronous job,
    it needs to wait until the asynchronous job ends and try to acquire
    the job again.
61

62 63 64 65 66 67 68 69
    Agent job condition is then used when thread wishes to talk to qemu
    agent monitor. It is possible to acquire just agent job
    (qemuDomainObjBeginAgentJob), or only normal job
    (qemuDomainObjBeginJob) or both at the same time
    (qemuDomainObjBeginJobWithAgent). Which type of job to grab depends
    whether caller wishes to communicate only with agent socket, or only
    with qemu monitor socket or both, respectively.

E
Eric Blake 已提交
70
    Immediately after acquiring the virDomainObjPtr lock, any method
71 72 73 74 75 76 77 78
    which intends to update state must acquire asynchronous, normal or
    agent job . The virDomainObjPtr lock is released while blocking on
    these condition variables.  Once the job condition is acquired, a
    method can safely release the virDomainObjPtr lock whenever it hits
    a piece of code which may sleep/wait, and re-acquire it after the
    sleep/wait.  Whenever an asynchronous job wants to talk to the
    monitor, it needs to acquire nested job (a special kind of normal
    job) to obtain exclusive access to the monitor.
E
Eric Blake 已提交
79 80 81 82 83

    Since the virDomainObjPtr lock was dropped while waiting for the
    job condition, it is possible that the domain is no longer active
    when the condition is finally obtained.  The monitor lock is only
    safe to grab after verifying that the domain is still active.
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105


  * qemuMonitorPtr:  Mutex

    Lock to be used when invoking any monitor command to ensure safety
    wrt any asynchronous events that may be dispatched from the monitor.
    It should be acquired before running a command.

    The job condition *MUST* be held before acquiring the monitor lock

    The virDomainObjPtr lock *MUST* be held before acquiring the monitor
    lock.

    The virDomainObjPtr lock *MUST* then be released when invoking the
    monitor command.


Helper methods
--------------

To lock the virDomainObjPtr

106
  virObjectLock()
107 108
    - Acquires the virDomainObjPtr lock

109
  virObjectUnlock()
110 111 112 113
    - Releases the virDomainObjPtr lock



114
To acquire the normal job condition
115

116
  qemuDomainObjBeginJob()
117 118
    - Waits until the job is compatible with current async job or no
      async job is running
E
Eric Blake 已提交
119
    - Waits for job.cond condition 'job.active != 0' using virDomainObjPtr
120 121 122 123
      mutex
    - Rechecks if the job is still compatible and repeats waiting if it
      isn't
    - Sets job.active to the job type
124 125 126


  qemuDomainObjEndJob()
127 128 129 130 131
    - Sets job.active to 0
    - Signals on job.cond condition



132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155
To acquire the agent job condition

  qemuDomainObjBeginAgentJob()
    - Waits until there is no other agent job set
    - Sets job.agentActive tp the job type

  qemuDomainObjEndAgentJob()
    - Sets job.agentActive to 0
    - Signals on job.cond condition



To acquire both normal and agent job condition

  qemuDomainObjBeginJobWithAgent()
    - Waits until there is no normal and no agent job set
    - Sets both job.active and job.agentActive with required job types

  qemuDomainObjEndJobWithAgent()
    - Sets both job.active and job.agentActive to 0
    - Signals on job.cond condition



156 157
To acquire the asynchronous job condition

158
  qemuDomainObjBeginAsyncJob()
159
    - Waits until no async job is running
E
Eric Blake 已提交
160
    - Waits for job.cond condition 'job.active != 0' using virDomainObjPtr
161 162 163 164 165 166 167 168 169
      mutex
    - Rechecks if any async job was started while waiting on job.cond
      and repeats waiting in that case
    - Sets job.asyncJob to the asynchronous job type


  qemuDomainObjEndAsyncJob()
    - Sets job.asyncJob to 0
    - Broadcasts on job.asyncCond condition
170 171 172 173 174 175 176 177 178 179 180



To acquire the QEMU monitor lock

  qemuDomainObjEnterMonitor()
    - Acquires the qemuMonitorObjPtr lock
    - Releases the virDomainObjPtr lock

  qemuDomainObjExitMonitor()
    - Releases the qemuMonitorObjPtr lock
E
Eric Blake 已提交
181
    - Acquires the virDomainObjPtr lock
182

183
  These functions must not be used by an asynchronous job.
184 185 186 187 188
  Note that the virDomainObj is unlocked during the time in
  monitor and it can be changed, e.g. if QEMU dies, qemuProcessStop
  may free the live domain definition and put the persistent
  definition back in vm->def. The callers should check the return
  value of ExitMonitor to see if the domain is still alive.
189

190

191
To acquire the QEMU monitor lock as part of an asynchronous job
192 193 194 195 196 197 198

  qemuDomainObjEnterMonitorAsync()
    - Validates that the right async job is still running
    - Acquires the qemuMonitorObjPtr lock
    - Releases the virDomainObjPtr lock
    - Validates that the VM is still active

199
  qemuDomainObjExitMonitor()
200 201 202 203 204 205 206
    - Releases the qemuMonitorObjPtr lock
    - Acquires the virDomainObjPtr lock

  These functions are for use inside an asynchronous job; the caller
  must check for a return of -1 (VM not running, so nothing to exit).
  Helper functions may also call this with QEMU_ASYNC_JOB_NONE when
  used from a sync job (such as when first starting a domain).
207

E
Eric Blake 已提交
208

209
To keep a domain alive while waiting on a remote command
E
Eric Blake 已提交
210

211
  qemuDomainObjEnterRemote()
E
Eric Blake 已提交
212 213
    - Releases the virDomainObjPtr lock

214
  qemuDomainObjExitRemote()
E
Eric Blake 已提交
215
    - Acquires the virDomainObjPtr lock
216 217 218 219 220 221


Design patterns
---------------


E
Eric Blake 已提交
222
 * Accessing something directly to do with a virDomainObjPtr
223 224 225

     virDomainObjPtr obj;

226
     obj = qemuDomObjFromDomain(dom);
227 228 229

     ...do work...

M
Michal Privoznik 已提交
230
     virDomainObjEndAPI(&obj);
231 232


E
Eric Blake 已提交
233
 * Updating something directly to do with a virDomainObjPtr
234 235 236

     virDomainObjPtr obj;

237
     obj = qemuDomObjFromDomain(dom);
238

239
     qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
240 241 242 243 244

     ...do work...

     qemuDomainObjEndJob(obj);

M
Michal Privoznik 已提交
245
     virDomainObjEndAPI(&obj);
246 247 248 249 250 251 252


 * Invoking a monitor command on a virDomainObjPtr

     virDomainObjPtr obj;
     qemuDomainObjPrivatePtr priv;

253
     obj = qemuDomObjFromDomain(dom);
254

255
     qemuDomainObjBeginJob(obj, QEMU_JOB_TYPE);
256 257 258

     ...do prep work...

E
Eric Blake 已提交
259
     if (virDomainObjIsActive(vm)) {
260
         qemuDomainObjEnterMonitor(obj);
E
Eric Blake 已提交
261 262 263
         qemuMonitorXXXX(priv->mon);
         qemuDomainObjExitMonitor(obj);
     }
264 265 266 267

     ...do final work...

     qemuDomainObjEndJob(obj);
M
Michal Privoznik 已提交
268
     virDomainObjEndAPI(&obj);
269 270


271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329
 * Invoking an agent command on a virDomainObjPtr

     virDomainObjPtr obj;
     qemuAgentPtr agent;

     obj = qemuDomObjFromDomain(dom);

     qemuDomainObjBeginAgentJob(obj, QEMU_AGENT_JOB_TYPE);

     ...do prep work...

     if (!qemuDomainAgentAvailable(obj, true))
         goto cleanup;

     agent = qemuDomainObjEnterAgent(obj);
     qemuAgentXXXX(agent, ..);
     qemuDomainObjExitAgent(obj, agent);

     ...do final work...

     qemuDomainObjEndAgentJob(obj);
     virDomainObjEndAPI(&obj);


 * Invoking both monitor and agent commands on a virDomainObjPtr

     virDomainObjPtr obj;
     qemuAgentPtr agent;

     obj = qemuDomObjFromDomain(dom);

     qemuDomainObjBeginJobWithAgent(obj, QEMU_JOB_TYPE, QEMU_AGENT_JOB_TYPE);

     if (!virDomainObjIsActive(dom))
         goto cleanup;

     ...do prep work...

     if (!qemuDomainAgentAvailable(obj, true))
         goto cleanup;

     agent = qemuDomainObjEnterAgent(obj);
     qemuAgentXXXX(agent, ..);
     qemuDomainObjExitAgent(obj, agent);

     ...

     qemuDomainObjEnterMonitor(obj);
     qemuMonitorXXXX(priv->mon);
     qemuDomainObjExitMonitor(obj);

     /* Alternatively, talk to the monitor first and then talk to the agent. */

     ...do final work...

     qemuDomainObjEndJobWithAgent(obj);
     virDomainObjEndAPI(&obj);


330
 * Running asynchronous job
331 332 333 334

     virDomainObjPtr obj;
     qemuDomainObjPrivatePtr priv;

335
     obj = qemuDomObjFromDomain(dom);
336

337
     qemuDomainObjBeginAsyncJob(obj, QEMU_ASYNC_JOB_TYPE);
338 339 340 341
     qemuDomainObjSetAsyncJobMask(obj, allowedJobs);

     ...do prep work...

342 343
     if (qemuDomainObjEnterMonitorAsync(driver, obj,
                                        QEMU_ASYNC_JOB_TYPE) < 0) {
344 345 346 347
         /* domain died in the meantime */
         goto error;
     }
     ...start qemu job...
348
     qemuDomainObjExitMonitor(driver, obj);
349 350

     while (!finished) {
351 352
         if (qemuDomainObjEnterMonitorAsync(driver, obj,
                                            QEMU_ASYNC_JOB_TYPE) < 0) {
353 354 355 356
             /* domain died in the meantime */
             goto error;
         }
         ...monitor job progress...
357
         qemuDomainObjExitMonitor(driver, obj);
358

359
         virObjectUnlock(obj);
360
         sleep(aWhile);
361
         virObjectLock(obj);
362 363 364 365 366
     }

     ...do final work...

     qemuDomainObjEndAsyncJob(obj);
M
Michal Privoznik 已提交
367
     virDomainObjEndAPI(&obj);
368 369 370


 * Coordinating with a remote server for migration
E
Eric Blake 已提交
371 372 373 374

     virDomainObjPtr obj;
     qemuDomainObjPrivatePtr priv;

375
     obj = qemuDomObjFromDomain(dom);
E
Eric Blake 已提交
376

377
     qemuDomainObjBeginAsyncJob(obj, QEMU_ASYNC_JOB_TYPE);
E
Eric Blake 已提交
378 379 380 381

     ...do prep work...

     if (virDomainObjIsActive(vm)) {
382
         qemuDomainObjEnterRemote(obj);
E
Eric Blake 已提交
383
         ...communicate with remote...
384
         qemuDomainObjExitRemote(obj);
E
Eric Blake 已提交
385 386 387 388 389 390 391 392 393
         /* domain may have been stopped while we were talking to remote */
         if (!virDomainObjIsActive(vm)) {
             qemuReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                             _("guest unexpectedly quit"));
         }
     }

     ...do final work...

394
     qemuDomainObjEndAsyncJob(obj);
M
Michal Privoznik 已提交
395
     virDomainObjEndAPI(&obj);