parsing.md 23.3 KB
Newer Older
A
Adam Barth 已提交
1 2 3
Parsing
=======

4
Parsing in Sky is a strict pipeline consisting of five stages:
A
Adam Barth 已提交
5 6

- decoding, which converts incoming bytes into Unicode characters
7
  using UTF-8.
A
Adam Barth 已提交
8

9
- normalising, which manipulates the sequence of characters.
A
Adam Barth 已提交
10

11 12 13 14
- tokenising, which converts these characters into three kinds of
  tokens: character tokens, start tag tokens, and end tag tokens.
  Character tokens have a single character value. Tag tokens have a
  tag name, and a list of name/value pairs known as attributes.
A
Adam Barth 已提交
15

16 17 18 19
- token cleanup, which converts sequences of character tokens into
  string tokens, and removes duplicate attributes in tag tokens.

- tree construction, which converts these tokens into a tree of nodes.
A
Adam Barth 已提交
20 21 22 23

Later stages cannot affect earlier stages.

When a sequence of bytes is to be parsed, there is always a defined
24 25
_parsing context_, which is either an Application object or a Module
object.
A
Adam Barth 已提交
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63


Decoding stage
--------------

To decode a sequence of bytes _bytes_ for parsing, the [UTF-8
decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used
to transform _bytes_ into a sequence of characters _characters_.

This sequence must then be passed to the normalisation stage.


Normalisation stage
-------------------

To normalise a sequence of characters, apply the following rules:

* Any U+000D character followed by a U+000A character must be removed.

* Any U+000D character not followed by a U+000A character must be
  converted to a U+000A character.

* Any U+0000 character must be converted to a U+FFFD character.

The converted sequence of characters must then be passed to the
tokenisation stage.


Tokenisation stage
------------------

To tokenise a sequence of characters, a state machine is used.

Initially, the state machine must begin in the **signature** state.

Each character in turn must be processed according to the rules of the
state at the time the character is processed. A character is processed
once it has been _consumed_. This produces a stream of tokens; the
64
tokens must be passed to the token cleanup stage.
A
Adam Barth 已提交
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94

When the last character is consumed, the tokeniser ends.


### Expecting a string ###

When the user agent is to _expect a string_, it must run these steps:

1. Let _expectation_ be the string to expect. When this string is
   indexed, the first character has index 0.

2. Assertion: The first character in _expectation_ is the current
   character, and _expectation_ has more than one character.

3. Consume the current character.

4. Let _index_ be 1.

5. Let _success_ and _failure_ be the states specified for success and
   failure respectively.

6. Switch to the **expect a string** state.


### Tokeniser states ###

#### **Signature** state ####

If the current character is...

95
* '``#``': If the _parsing context_ is not an Application, switch to
A
Adam Barth 已提交
96
  the _failed signature_ state. Otherwise, expect the string
97
  "``#!mojo mojo:sky``", with _after signature_ as the _success_
A
Adam Barth 已提交
98 99
  state and _failed signature_ as the _failure_ state.

100
* '``S``': If the _parsing context_ is not a Module, switch to the
A
Adam Barth 已提交
101
  _failed signature_ state. Otherwise, expect the string
102
  "``SKY MODULE``", with _after signature_ as the _success_ state,
A
Adam Barth 已提交
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140
  and _failed signature_ as the _failure_ state.

* Anything else: Jump to the **failed signature** state.


#### **Expect a string** state ####

If the current character is not the same as the <i>index</i>th character in
_expectation_, then switch to the _failure_ state.

Otherwise, consume the character, and increase _index_. If _index_ is
now equal to the length of _expectation_, then switch to the _success_
state.


#### **After signature** state ####

If the current character is...

* U+000A: Consume the character and switch to the **data** state.
* U+0020: Consume the character and switch to the **consume rest of
  line** state.
* Anything else: Switch to the **failed signature** state.


#### **Failed signature** state ####

Stop parsing. No tokens are emitted. The file is not a sky file.


#### **Consume rest of line** state ####

If the current character is...

* U+000A: Consume the character and switch to the **data** state.
* Anything else: Consume the character and stay in this state.


H
Hixie 已提交
141
#### **Data** state ####
A
Adam Barth 已提交
142 143 144

If the current character is...

145
* '``<``': Consume the character and switch to the **tag open** state.
A
Adam Barth 已提交
146

147
* '``&``': Consume the character and switch to the **character
148
  reference** state, with the _return state_ set to the **data**
149 150
  state, and the _emitting operation_ being to emit a character token
  for the given character.
151

A
Adam Barth 已提交
152 153 154 155
* Anything else: Emit the current input character as a character
  token. Consume the character. Stay in this state.


H
Hixie 已提交
156
#### **Script raw data** state ####
157

H
Hixie 已提交
158 159
If the current character is...

160
* '``<``': Consume the character and switch to the **script raw
H
Hixie 已提交
161 162 163 164 165 166
  data: close 1** state.

* Anything else: Emit the current input character as a character
  token. Consume the character. Stay in this state.


H
Hixie 已提交
167
#### **Script raw data: close 1** state ####
H
Hixie 已提交
168 169 170

If the current character is...

171
* '``/``': Consume the character and switch to the **script raw
H
Hixie 已提交
172 173
  data: close 2** state.

174
* Anything else: Emit '``<``' character tokens. Consume the
H
Hixie 已提交
175 176 177
  character. Switch to the **script raw data** state.


H
Hixie 已提交
178
#### **Script raw data: close 2** state ####
H
Hixie 已提交
179 180 181

If the current character is...

182
* '``s``': Consume the character and switch to the **script raw
H
Hixie 已提交
183 184
  data: close 3** state.

185
* Anything else: Emit '``</``' character tokens. Consume the
H
Hixie 已提交
186 187 188
  character. Switch to the **script raw data** state.


H
Hixie 已提交
189
#### **Script raw data: close 3** state ####
H
Hixie 已提交
190 191 192

If the current character is...

193
* '``c``': Consume the character and switch to the **script raw
H
Hixie 已提交
194 195
  data: close 4** state.

196
* Anything else: Emit '``</s``' character tokens. Consume the
H
Hixie 已提交
197 198 199
  character. Switch to the **script raw data** state.


H
Hixie 已提交
200
#### **Script raw data: close 4** state ####
H
Hixie 已提交
201 202 203

If the current character is...

204
* '``r``': Consume the character and switch to the **script raw
H
Hixie 已提交
205 206
  data: close 5** state.

207
* Anything else: Emit '``</sc``' character tokens. Consume the
H
Hixie 已提交
208 209 210
  character. Switch to the **script raw data** state.


H
Hixie 已提交
211
#### **Script raw data: close 5** state ####
H
Hixie 已提交
212 213 214

If the current character is...

215
* '``i``': Consume the character and switch to the **script raw
H
Hixie 已提交
216 217
  data: close 6** state.

218
* Anything else: Emit '``</scr``' character tokens. Consume the
H
Hixie 已提交
219 220 221
  character. Switch to the **script raw data** state.


H
Hixie 已提交
222
#### **Script raw data: close 6** state ####
H
Hixie 已提交
223 224 225

If the current character is...

226
* '``p``': Consume the character and switch to the **script raw
H
Hixie 已提交
227 228
  data: close 7** state.

229
* Anything else: Emit '``</scri``' character tokens. Consume the
H
Hixie 已提交
230 231 232
  character. Switch to the **script raw data** state.


H
Hixie 已提交
233
#### **Script raw data: close 7** state ####
H
Hixie 已提交
234 235 236

If the current character is...

237
* '``t``': Consume the character and switch to the **script raw
H
Hixie 已提交
238 239
  data: close 8** state.

240
* Anything else: Emit '``</scrip``' character tokens. Consume the
H
Hixie 已提交
241 242 243
  character. Switch to the **script raw data** state.


H
Hixie 已提交
244
#### **Script raw data: close 8** state ####
H
Hixie 已提交
245 246 247

If the current character is...

248 249
* U+0020, U+000A, '``/``', '``>``': Create an end tag token, and
  let its tag name be the string '``script``'. Switch to the
H
Hixie 已提交
250 251
  **before attribute name** state without consuming the character.

252
* Anything else: Emit '``</script``' character tokens. Consume the
H
Hixie 已提交
253
  character. Switch to the **script raw data** state.
254 255


H
Hixie 已提交
256
#### **Style raw data** state ####
257

H
Hixie 已提交
258
If the current character is...
259

260
* '``<``': Consume the character and switch to the **style raw
H
Hixie 已提交
261
  data: close 1** state.
262

H
Hixie 已提交
263 264
* Anything else: Emit the current input character as a character
  token. Consume the character. Stay in this state.
265 266


H
Hixie 已提交
267
#### **Style raw data: close 1** state ####
268

H
Hixie 已提交
269 270
If the current character is...

271
* '``/``': Consume the character and switch to the **style raw
H
Hixie 已提交
272 273
  data: close 2** state.

274
* Anything else: Emit '``<``' character tokens. Consume the
H
Hixie 已提交
275 276 277
  character. Switch to the **style raw data** state.


H
Hixie 已提交
278
#### **Style raw data: close 2** state ####
H
Hixie 已提交
279 280 281

If the current character is...

282
* '``s``': Consume the character and switch to the **style raw
H
Hixie 已提交
283 284
  data: close 3** state.

285
* Anything else: Emit '``</``' character tokens. Consume the
H
Hixie 已提交
286
  character. Switch to the **style raw data** state.
287

H
Hixie 已提交
288

H
Hixie 已提交
289
#### **Style raw data: close 3** state ####
H
Hixie 已提交
290 291 292

If the current character is...

293
* '``t``': Consume the character and switch to the **style raw
H
Hixie 已提交
294 295
  data: close 4** state.

296
* Anything else: Emit '``</s``' character tokens. Consume the
H
Hixie 已提交
297 298 299
  character. Switch to the **style raw data** state.


H
Hixie 已提交
300
#### **Style raw data: close 4** state ####
H
Hixie 已提交
301 302 303

If the current character is...

304
* '``y``': Consume the character and switch to the **style raw
H
Hixie 已提交
305 306
  data: close 5** state.

307
* Anything else: Emit '``</st``' character tokens. Consume the
H
Hixie 已提交
308 309 310
  character. Switch to the **style raw data** state.


H
Hixie 已提交
311
#### **Style raw data: close 5** state ####
H
Hixie 已提交
312 313 314

If the current character is...

315
* '``l``': Consume the character and switch to the **style raw
H
Hixie 已提交
316 317
  data: close 6** state.

318
* Anything else: Emit '``</sty``' character tokens. Consume the
H
Hixie 已提交
319 320 321
  character. Switch to the **style raw data** state.


H
Hixie 已提交
322
#### **Style raw data: close 6** state ####
H
Hixie 已提交
323 324 325

If the current character is...

326
* '``e``': Consume the character and switch to the **style raw
H
Hixie 已提交
327 328
  data: close 7** state.

329
* Anything else: Emit '``</styl``' character tokens. Consume the
H
Hixie 已提交
330 331 332
  character. Switch to the **style raw data** state.


H
Hixie 已提交
333
#### **Style raw data: close 7** state ####
H
Hixie 已提交
334 335 336

If the current character is...

337 338
* U+0020, U+000A, '``/``', '``>``': Create an end tag token, and
  let its tag name be the string '``style``'. Switch to the
H
Hixie 已提交
339 340
  **before attribute name** state without consuming the character.

341
* Anything else: Emit '``</style``' character tokens. Consume the
H
Hixie 已提交
342
  character. Switch to the **style raw data** state.
343 344


H
Hixie 已提交
345
#### **Tag open** state ####
346 347 348

If the current character is...

349
* '``!``': Consume the character and switch to the **comment start
350 351
  1** state.

352
* '``/``': Consume the character and switch to the **close tag
353 354
  state** state.

355
* '``>``': Emit character tokens for '``<>``'. Consume the current
356 357
  character. Switch to the **data** state.

358 359
* '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``',
  '``-``', '``_``', '``.``': Create a start tag token, let its
360 361 362
  tag name be the current character, consume the current character and
  switch to the **tag name** state.

363
* Anything else: Emit the character token for '``<``'. Switch to the
364 365 366
  **data** state without consuming the current character.


H
Hixie 已提交
367
#### **Close tag** state ####
368 369 370

If the current character is...

371
* '``>``': Emit character tokens for '``</>``'. Consume the current
372 373
  character. Switch to the **data** state.

374 375
* '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``',
  '``-``', '``_``', '``.``': Create an end tag token, let its
376 377 378
  tag name be the current character, consume the current character and
  switch to the **tag name** state.

379
* Anything else: Emit the character tokens for '``</``'. Switch to
380 381 382
  the **data** state without consuming the current character.


H
Hixie 已提交
383
#### **Tag name** state ####
384 385 386 387 388 389

If the current character is...

* U+0020, U+000A: Consume the current character. Switch to the
  **before attribute name** state.

390
* '``/``': Consume the current character. Switch to the **void tag**
391 392
  state.

393
* '``>``': Consume the current character. Switch to the **after
394 395 396 397 398 399
  tag** state.

* Anything else: Append the current character to the tag name, and
  consume the current character. Stay in this state.


H
Hixie 已提交
400
#### **Void tag** state ####
401 402 403

If the current character is...

404
* '``>``': Consume the current character. Switch to the **after void
405 406 407 408 409 410
  tag** state.

* Anything else: Switch to the **before attribute name** state without
  consuming the current character.


H
Hixie 已提交
411
#### **Before attribute name** state ####
412 413 414 415 416

If the current character is...

* U+0020, U+000A: Consume the current character. Stay in this state.

417
* '``/``': Consume the current character. Switch to the **void tag**
418 419
  state.

420
* '``>``': Consume the current character. Switch to the **after
421 422 423 424 425 426 427
  tag** state.

* Anything else: Create a new attribute in the tag token, and set its
  name to the current character. Consume the current character. Switch
  to the **attribute name** state.


H
Hixie 已提交
428
#### **Attribute name** state ####
429 430 431 432 433 434

If the current character is...

* U+0020, U+000A: Consume the current character. Switch to the **after
  attribute name** state.

435
* '``/``': Consume the current character. Switch to the **void tag**
436 437
  state.

438
* '``=``': Consume the current character. Switch to the **before
439 440
  attribute value** state.

441
* '``>``': Consume the current character. Switch to the **after
442 443 444 445 446 447 448
  tag** state.

* Anything else: Append the current character to the most recently
  added attribute's name, and consume the current character. Stay in
  this state.


H
Hixie 已提交
449
#### **After attribute name** state ####
450 451 452 453 454

If the current character is...

* U+0020, U+000A: Consume the current character. Stay in this state.

455
* '``/``': Consume the current character. Switch to the **void tag**
456 457
  state.

458
* '``=``': Consume the current character. Switch to the **before
459 460
  attribute value** state.

461
* '``>``': Consume the current character. Switch to the **after
462 463 464 465 466 467 468
  tag** state.

* Anything else: Create a new attribute in the tag token, and set its
  name to the current character. Consume the current character. Switch
  to the **attribute name** state.


H
Hixie 已提交
469
#### **Before attribute value** state ####
470 471 472 473 474

If the current character is...

* U+0020, U+000A: Consume the current character. Stay in this state.

475
* '``>``': Consume the current character. Switch to the **after
476 477
  tag** state.

478
* '``'``': Consume the current character. Switch to the
479 480
  **single-quoted attribute value** state.

481
* '``"``': Consume the current character. Switch to the
482 483 484 485 486 487 488
  **double-quoted attribute value** state.

* Anything else: Set the value of the most recently added attribute to
  the current character. Consume the current character. Switch to the
  **unquoted attribute value** state.


H
Hixie 已提交
489
#### **Single-quoted attribute value** state ####
490 491 492

If the current character is...

493
* '``'``': Consume the current character. Switch to the
494 495
  **before attribute name** state.

496
* '``&``': Consume the character and switch to the **character
497
  reference** state, with the _return state_ set to the
498 499 500
  **single-quoted attribute value** state and the _emitting operation_
  being to append the given character to the value of the most
  recently added attribute.
501 502 503 504 505 506

* Anything else: Append the current character to the value of the most
  recently added attribute. Consume the current character. Stay in
  this state.


H
Hixie 已提交
507
#### **Double-quoted attribute value** state ####
508 509 510

If the current character is...

511
* '``"``': Consume the current character. Switch to the
512 513
  **before attribute name** state.

514
* '``&``': Consume the character and switch to the **character
515
  reference** state, with the _return state_ set to the
516 517 518
  **double-quoted attribute value** state and the _emitting operation_
  being to append the given character to the value of the most
  recently added attribute.
519 520 521 522 523 524

* Anything else: Append the current character to the value of the most
  recently added attribute. Consume the current character. Stay in
  this state.


H
Hixie 已提交
525
#### **Unquoted attribute value** state ####
526 527 528 529 530 531

If the current character is...

* U+0020, U+000A: Consume the current character. Switch to the
  **before attribute name** state.

532
* '``>``': Consume the current character. Switch to the **data**
533 534
  state. Switch to the **after tag** state.

535
* '``&``': Consume the character and switch to the **character
536
  reference** state, with the _return state_ set to the **unquoted
537 538 539
  attribute value** state which has the same effect), and the
  _emitting operation_ being to append the given character to the
  value of the most recently added attribute.
540 541 542 543 544 545

* Anything else: Append the current character to the value of the most
  recently added attribute. Consume the current character. Stay in
  this state.


H
Hixie 已提交
546
#### **After tag** state ####
H
Hixie 已提交
547 548 549 550

Emit the tag token.

If the tag token was a start tag token and the tag name was
551
'``script``', then and switch to the **script raw data** state.
H
Hixie 已提交
552 553

If the tag token was a start tag token and the tag name was
554
'``style``', then and switch to the **style raw data** state.
H
Hixie 已提交
555 556 557 558

Otherwise, switch to the **data** state.


H
Hixie 已提交
559
#### **After void tag** state ####
560 561 562 563 564 565 566 567 568

Emit the tag token.

If the tag token is a start tag token, emit an end tag token with the
same tag name.

Switch to the **data** state.


H
Hixie 已提交
569
#### **Comment start 1** state ####
570 571 572

If the current character is...

573
* '``-``': Consume the character and switch to the **comment start
574 575
  2** state.

576
* '``>``': Emit character tokens for '``<!>``'. Consume the
577 578 579
  current character. Switch to the **data** state.


H
Hixie 已提交
580
#### **Comment start 2** state ####
581 582 583

If the current character is...

584
* '``-``': Consume the character and switch to the **comment**
585 586
  state.

587
* '``>``': Emit character tokens for '``<!->``'. Consume the
588 589 590
  current character. Switch to the **data** state.


H
Hixie 已提交
591
#### **Comment** state ####
592 593 594

If the current character is...

595
* '``-``': Consume the character and switch to the **comment end 1**
596 597 598 599 600 601
  state.

* Anything else: Consume the character and switch to the **comment**
  state.


H
Hixie 已提交
602
#### **Comment end 1** state ####
603 604 605

If the current character is...

606
* '``-``': Consume the character, switch to the **comment end 2**
607 608 609 610 611 612
  state.

* Anything else: Consume the character, and switch to the **comment**
  state.


H
Hixie 已提交
613
#### **Comment end 2** state ####
614 615 616

If the current character is...

617
* '``>``': Consume the character and switch to the **data** state.
618

619
* '``-``': Consume the character, but stay in this state.
620 621 622 623 624

* Anything else: Consume the character, and switch to the **comment**
  state.


H
Hixie 已提交
625
#### **Character reference** state ####
626

627
Let _raw value_ be the string '``&``'.
628 629 630 631 632

Append the current character to _raw value_.

If the current character is...

633
* '``#``': Consume the character, and switch to the **numeric
634 635
  character reference** state.

636 637 638
* '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': switch to the
  **named character reference** state without consuming the current
  character.
639 640 641 642 643 644

* Anything else: Run the _emitting operation_ for all but the last
  character in _raw value_, and switch to the **data state** without
  consuming the current character.


H
Hixie 已提交
645
#### **Numeric character reference** state ####
646 647 648 649 650

Append the current character to _raw value_.

If the current character is...

651
* '``x``', '``X``': Let _value_ be zero, consume the character,
652 653
  and switch to the **hexadecimal numeric character reference** state.

654
* '``0``'..'``9``': Let _value_ be the numeric value of the
655 656 657 658 659 660 661 662 663
  current character interpreted as a decimal digit, consume the
  character, and switch to the **decimal numeric character reference**
  state.

* Anything else: Run the _emitting operation_ for all but the last
  character in _raw value_, and switch to the **data state** without
  consuming the current character.


H
Hixie 已提交
664
#### **Hexadecimal numeric character reference** state ####
665 666 667 668 669

Append the current character to _raw value_.

If the current character is...

670
* '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``':
671 672 673
  Let _value_ be sixteen times _value_ plus the numeric value of the
  current character interpreted as a hexadecimal digit.

674
* '``;``': Consume the character. If _value_ is between 0x0001 and
675 676 677 678 679 680 681 682 683 684 685
  0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive,
  run the _emitting operation_ with a unicode character having the
  scalar value _value_; otherwise, run the _emitting operation_ with
  the character U+FFFD. Then, in either case, switch to the _return
  state_.

* Anything else: Run the _emitting operation_ for all but the last
  character in _raw value_, and switch to the **data state** without
  consuming the current character.


H
Hixie 已提交
686
#### **Decimal numeric character reference** state ####
687 688 689 690 691

Append the current character to _raw value_.

If the current character is...

692
* '``0``'..'``9``': Let _value_ be ten times _value_ plus the
693 694 695
  numeric value of the current character interpreted as a decimal
  digit.

696
* '``;``': Consume the character. If _value_ is between 0x0001 and
697 698 699 700 701 702 703 704 705 706 707
  0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive,
  run the _emitting operation_ with a unicode character having the
  scalar value _value_; otherwise, run the _emitting operation_ with
  the character U+FFFD. Then, in either case, switch to the _return
  state_.

* Anything else: Run the _emitting operation_ for all but the last
  character in _raw value_, and switch to the **data state** without
  consuming the current character.


708
#### **Named character reference** state ####
709 710 711 712 713

Append the current character to _raw value_.

If the current character is...

714 715
* '``;``': Consume the character.
  If the _raw value_ is...
716

717 718
  - '``&amp;``: Emit Run the _emitting operation_ for the character
    '``&``'.
719

720 721
  - '``&apos;``: Emit Run the _emitting operation_ for the character
    '``'``'.
722

723 724
  - '``&gt;``: Emit Run the _emitting operation_ for the character
    '``>``'.
725

726 727
  - '``&lt;``: Emit Run the _emitting operation_ for the character
    '``<``'.
728

729 730
  - '``&quot;``: Emit Run the _emitting operation_ for the character
    '``"``'.
731

732
  Then, switch to the _return state_.
733

734 735
* '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``': Consume the
  character and stay in this state.
736 737 738 739

* Anything else: Run the _emitting operation_ for all but the last
  character in _raw value_, and switch to the **data state** without
  consuming the current character.
A
Adam Barth 已提交
740 741


742 743
Token cleanup stage
-------------------
A
Adam Barth 已提交
744

745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765
Replace each sequence of character tokens with a single string token
whose value is the concatenation of all the characters in the
character tokens.

For each start tag token, remove all but the first name/value pair for
each name (i.e. remove duplicate attributes, keeping only the first
one).

For each end tag token, remove the attributes entirely.

If the token is a start tag token, notify the JavaScript token stream
callback of the token.

Then, pass the tokens to the tree construction stage.


Tree construction stage
-----------------------

To construct a node tree from a _sequence of tokens_ and a document
_document_:
A
Adam Barth 已提交
766 767

1. Initialize the _stack of open nodes_ to be _document_.
768 769 770 771 772
2. Consider each token _token_ in the _sequence of tokens_ in turn, as
   follows. If a token is to be skipped, then jump straight to the
   next token, without doing any more work with the skipped token.
   - If _token_ is a string token,
     1. If the value of the token contains only U+0020 and U+000A
773
        characters, and there is no ``t`` element on the _stack of
774 775 776 777
        open nodes_, then skip the token.
     2. Create a text node _node_ whose character data is the value of
        the token.
     3. Append _node_ to the top node in the _stack of open nodes_.
A
Adam Barth 已提交
778
   - If _token_ is a start tag token,
779 780
     1. Create an element _node_ with tag name and attributes given by
        the token.
A
Adam Barth 已提交
781
     2. Append _node_ to the top node in the _stack of open nodes_.
782 783 784 785 786
     3. Push _node_ onto the top of the _stack of open nodes_.
     4. If _node_ is a ``template`` element, then:
        1. Let _fragment_ be the ``DocumentFragment`` object that the
          ``template`` element uses as its template contents container.
        2. Push _fragment_ onto the top of the _stack of open nodes_.
787 788 789 790
   - If _token_ is an end tag token:
     1. Let _node_ be the topmost node in the _stack of open nodes_
        whose tag name is the same as the token's tag name, if any. If
        there isn't one, skip this token.
791
     2. If there's a ``template`` element in the _stack of open
792 793 794
        nodes_ above _node_, then skip this token.
     3. Pop nodes from the _stack of open nodes_ until _node_ has been
        popped.
795
     4. If _node_'s tag name is ``script``, then yield until there
796 797 798
        are no pending import loads, then execute the script given by
        the element's contents.
3. Yield until there are no pending import loads.
799
3. Fire a ``load`` event at the _parsing context_ object.