提交 · 235c82acca0491465e94be3cae2583b42d37c859 · openeuler / qemu

25 8月, 2018 40 次提交

Merge remote-tracking branch 'remotes/otubo/tags/pull-seccomp-20180823' into staging · 235c82ac

由 Peter Maydell 提交于 8月 25, 2018

pull-seccomp-20180823

# gpg: Signature made Thu 23 Aug 2018 15:46:13 BST
# gpg:                using RSA key DF32E7C0F0FFF9A2
# gpg: Good signature from "Eduardo Otubo (Senior Software Engineer) <otubo@redhat.com>"
# Primary key fingerprint: D67E 1B50 9374 86B4 0723  DBAB DF32 E7C0 F0FF F9A2

* remotes/otubo/tags/pull-seccomp-20180823:
  seccomp: set the seccomp filter to all threads
  configure: require libseccomp 2.2.0
  seccomp: prefer SCMP_ACT_KILL_PROCESS if available
  seccomp: use SIGSYS signal instead of killing the thread
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

235c82ac

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-fixes-20180823.1' into staging · 17182bb4

由 Peter Maydell 提交于 8月 25, 2018

VFIO fixes 2018-08-23

 - Fix coverity reported issue with use of realpath (Alex Williamson)

 - Cleanup file descriptor in error path (Alex Williamson)

 - Fix postcopy use of new balloon inhibitor (Alex Williamson)

# gpg: Signature made Thu 23 Aug 2018 17:46:41 BST
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-fixes-20180823.1:
  postcopy: Synchronize usage of the balloon inhibitor
  vfio/pci: Fix failure to close file descriptor on error
  vfio/pci: Handle subsystem realpath() returning NULL
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

17182bb4

Merge remote-tracking branch 'remotes/armbru/tags/pull-qobject-2018-08-24' into staging · cc9821fa

由 Peter Maydell 提交于 8月 25, 2018

QObject patches for 2018-08-24

# gpg: Signature made Fri 24 Aug 2018 20:28:53 BST
# gpg:                using RSA key 3870B400EB918653
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* remotes/armbru/tags/pull-qobject-2018-08-24: (58 commits)
  json: Update references to RFC 7159 to RFC 8259
  json: Support %% in JSON strings when interpolating
  json: Improve safety of qobject_from_jsonf_nofail() & friends
  json: Keep interpolation state in JSONParserContext
  tests/drive_del-test: Fix harmless JSON interpolation bug
  json: Clean up headers
  qobject: Drop superfluous includes of qemu-common.h
  json: Make JSONToken opaque outside json-parser.c
  json: Unbox tokens queue in JSONMessageParser
  json: Streamline json_message_process_token()
  json: Enforce token count and size limits more tightly
  qjson: Have qobject_from_json() & friends reject empty and blank
  json: Assert json_parser_parse() consumes all tokens on success
  json: Fix streamer not to ignore trailing unterminated structures
  json: Fix latent parser aborts at end of input
  qjson: Fix qobject_from_json() & friends for multiple values
  json: Improve names of lexer states related to numbers
  json: Replace %I64d, %I64u by %PRId64, %PRIu64
  json: Leave rejecting invalid interpolation to parser
  json: Pass lexical errors and limit violations to callback
  ...
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

cc9821fa

Merge remote-tracking branch 'remotes/amarkovic/tags/mips-queue-aug-2018' into staging · e2e6fa67

由 Peter Maydell 提交于 8月 24, 2018

MIPS queue August 2018 v6

# gpg: Signature made Fri 24 Aug 2018 16:52:27 BST
# gpg:                using RSA key D4972A8967F75A65
# gpg: Good signature from "Aleksandar Markovic <amarkovic@wavecomp.com>"
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 8526 FBF1 5DA3 811F 4A01  DD75 D497 2A89 67F7 5A65

* remotes/amarkovic/tags/mips-queue-aug-2018: (45 commits)
  target/mips: Add definition of nanoMIPS I7200 CPU
  mips_malta: Fix semihosting argument passing for nanoMIPS bare metal
  mips_malta: Add setting up GT64120 BARs to the nanoMIPS bootloader
  mips_malta: Add basic nanoMIPS boot code for Malta board
  elf: Don't check FCR31_NAN2008 bit for nanoMIPS
  elf: On elf loading, treat both EM_MIPS and EM_NANOMIPS as legal for MIPS
  elf: Relax MIPS' elf_check_arch() to accept EM_NANOMIPS too
  elf: Add EM_NANOMIPS value as a valid one for e_machine field
  target/mips: Fix ERET/ERETNC behavior related to ADEL exception
  target/mips: Add updating BadInstr and BadInstrX for nanoMIPS
  target/mips: Add availability control via bit NMS
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 6
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 5
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 4
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 3
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 2
  target/mips: Add emulation of DSP ASE for nanoMIPS - part 1
  target/mips: Implement MT ASE support for nanoMIPS
  target/mips: Fix pre-nanoMIPS MT ASE instructions availability control
  target/mips: Add emulation of nanoMIPS 32-bit branch instructions
  ...
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>

e2e6fa67

json: Update references to RFC 7159 to RFC 8259 · 37aded92

由 Markus Armbruster 提交于 8月 23, 2018

RFC 8259 (December 2017) obsoletes RFC 7159 (March 2014).
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-59-armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

37aded92

json: Support %% in JSON strings when interpolating · 8bca4613

由 Markus Armbruster 提交于 8月 23, 2018

The previous commit makes JSON strings containing '%' awkward to
express in templates: you'd have to mask the '%' with an Unicode
escape \u0025.  No template currently contains such JSON strings.
Support the printf conversion specification %% in JSON strings as a
convenience anyway, because it's trivially easy to do.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-58-armbru@redhat.com>

8bca4613

json: Improve safety of qobject_from_jsonf_nofail() & friends · 16a48599

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser optionally supports interpolation.  This is used to
build QObjects by parsing string templates.  The templates are C
literals, so parse errors (such as invalid interpolation
specifications) are actually programming errors.  Consequently, the
functions providing parsing with interpolation
(qobject_from_jsonf_nofail(), qobject_from_vjsonf_nofail(),
qdict_from_jsonf_nofail(), qdict_from_vjsonf_nofail()) pass
&error_abort to the parser.

However, there's another, more dangerous kind of programming error:
since we use va_arg() to get the value to interpolate, behavior is
undefined when the variable argument isn't consistent with the
interpolation specification.

The same problem exists with printf()-like functions, and the solution
is to have the compiler check consistency.  This is what
GCC_FMT_ATTR() is about.

To enable this type checking for interpolation as well, we carefully
chose our interpolation specifications to match printf conversion
specifications, and decorate functions parsing templates with
GCC_FMT_ATTR().

Note that this only protects against undefined behavior due to type
errors.  It can't protect against use of invalid interpolation
specifications that happen to be valid printf conversion
specifications.

However, there's still a gaping hole in the type checking: GCC
recognizes '%' as start of printf conversion specification anywhere in
the template, but the parser recognizes it only outside JSON strings.
For instance, if someone were to pass a "{ '%s': %d }" template, GCC
would require a char * and an int argument, but the parser would
va_arg() only an int argument, resulting in undefined behavior.

Avoid undefined behavior by catching the programming error at run
time: have the parser recognize and reject '%' in JSON strings.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-57-armbru@redhat.com>

16a48599

json: Keep interpolation state in JSONParserContext · ada74c3b

由 Markus Armbruster 提交于 8月 23, 2018

The recursive descent parser passes along a pointer to
JSONParserContext.  It additionally passes a pointer to interpolation
state (a va_alist *) as needed to reach its consumer
parse_interpolation().

Stuffing the latter pointer into JSONParserContext saves us the
trouble of passing it along, so do that.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-56-armbru@redhat.com>

ada74c3b

tests/drive_del-test: Fix harmless JSON interpolation bug · 83273e84

由 Markus Armbruster 提交于 8月 23, 2018

test_after_failed_device_add() does this:

    response = qmp("{'execute': 'device_add',"
                   " 'arguments': {"
                   "   'driver': 'virtio-blk-%s',"
                   "   'drive': 'drive0'"
                   "}}", qvirtio_get_dev_type());

Wrong.  An interpolation specification must be a JSON token, it
doesn't work within JSON string tokens.  The code above doesn't use
the value of qvirtio_get_dev_type(), and sends arguments

    {"driver": "virtio-blk-%s", "drive": "drive0"}}

The command fails because there is no driver named "virtio-blk-%".
Harmless, since the test wants the command to fail.  Screwed up in
commit 2f84a92e.

Fix the obvious way.  The command now fails because the drive is
empty, like it did before commit 2f84a92e.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-55-armbru@redhat.com>

83273e84

json: Clean up headers · 86cdf9ec

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser has three public headers, json-lexer.h, json-parser.h,
json-streamer.h.  They all contain stuff that is of no interest
outside qobject/json-*.c.

Collect the public interface in include/qapi/qmp/json-parser.h, and
everything else in qobject/json-parser-int.h.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-54-armbru@redhat.com>

86cdf9ec

qobject: Drop superfluous includes of qemu-common.h · 812ce33e

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-53-armbru@redhat.com>

812ce33e

json: Make JSONToken opaque outside json-parser.c · abe7c206

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-52-armbru@redhat.com>

abe7c206

json: Unbox tokens queue in JSONMessageParser · a2731e08

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-51-armbru@redhat.com>

a2731e08

json: Streamline json_message_process_token() · 8d3265b3

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-50-armbru@redhat.com>

8d3265b3

json: Enforce token count and size limits more tightly · da09cfbf

由 Markus Armbruster 提交于 8月 23, 2018

Token count and size limits exist to guard against excessive heap
usage.  We check them only after we created the token on the heap.
That's assigning a cowboy to the barn to lasso the horse after it has
bolted.  Close the barn door instead: check before we create the
token.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-49-armbru@redhat.com>

da09cfbf

qjson: Have qobject_from_json() & friends reject empty and blank · dd98e848

由 Markus Armbruster 提交于 8月 23, 2018

The last case where qobject_from_json() & friends return null without
setting an error is empty or blank input.  Callers:

* block.c's parse_json_protocol() reports "Could not parse the JSON
  options".  It's marked as a work-around, because it also covered
  actual bugs, but they got fixed in the previous few commits.

* qobject_input_visitor_new_str() reports "JSON parse error".  Also
  marked as work-around.  The recent fixes have made this unreachable,
  because it currently gets called only for input starting with '{'.

* check-qjson.c's empty_input() and blank_input() demonstrate the
  behavior.

* The other callers are not affected since they only pass input with
  exactly one JSON value or, in the case of negative tests, one error.

Fail with "Expecting a JSON value" instead of returning null, and
simplify callers.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-48-armbru@redhat.com>

dd98e848

json: Assert json_parser_parse() consumes all tokens on success · 5d50113c

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-47-armbru@redhat.com>

5d50113c

json: Fix streamer not to ignore trailing unterminated structures · f9277915

由 Markus Armbruster 提交于 8月 23, 2018

json_message_process_token() accumulates tokens until it got the
sequence of tokens that comprise a single JSON value (it counts curly
braces and square brackets to decide).  It feeds those token sequences
to json_parser_parse().  If a non-empty sequence of tokens remains at
the end of the parse, it's silently ignored.  check-qjson.c cases
unterminated_array(), unterminated_array_comma(), unterminated_dict(),
unterminated_dict_comma() demonstrate this bug.

Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
streamer receives it, it feeds the accumulated tokens to
json_parser_parse().
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-46-armbru@redhat.com>

f9277915

json: Fix latent parser aborts at end of input · e06d008a

由 Markus Armbruster 提交于 8月 23, 2018

json-parser.c carefully reports end of input like this:

    token = parser_context_pop_token(ctxt);
    if (token == NULL) {
        parse_error(ctxt, NULL, "premature EOI");
        goto out;
    }

Except parser_context_pop_token() can't return null, it fails its
assertion instead.  Same for parser_context_peek_token().  Broken in
commit 65c0f1e9, and faithfully preserved in commit 95385fe9.
Only a latent bug, because the streamer throws away any input that
could trigger it.

Drop the assertions, so we can fix the streamer in the next commit.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-45-armbru@redhat.com>

e06d008a

qjson: Fix qobject_from_json() & friends for multiple values · 2a4794ba

由 Markus Armbruster 提交于 8月 23, 2018

qobject_from_json() & friends use the consume_json() callback to
receive either a value or an error from the parser.

When they are fed a string that contains more than either one JSON
value or one JSON syntax error, consume_json() gets called multiple
times.

When the last call receives a value, qobject_from_json() returns that
value.  Any other values are leaked.

When any call receives an error, qobject_from_json() sets the first
error received.  Any other errors are thrown away.

When values follow errors, qobject_from_json() returns both a value
and sets an error.  That's bad.  Impact:

* block.c's parse_json_protocol() ignores and leaks the value.  It's
  used to to parse pseudo-filenames starting with "json:".  The
  pseudo-filenames can come from the user or from image meta-data such
  as a QCOW2 image's backing file name.

* vl.c's parse_display_qapi() ignores and leaks the error.  It's used
  to parse the argument of command line option -display.

* vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
  it in @err.  main() will then pass a pointer to a non-null Error *
  to net_init_clients(), which is forbidden.  It can lead to assertion
  failure or other misbehavior.

* check-qjson.c's multiple_values() demonstrates the badness.

* The other callers are not affected since they only pass strings with
  exactly one JSON value or, in the case of negative tests, one
  error.

The impact on the _nofail() functions is relatively harmless.  They
abort when any call receives an error.  Else they return the last
value, and leak the others, if any.

Fix consume_json() as follows.  On the first call, save value and
error as before.  On subsequent calls, if any, don't save them.  If
the first call saved a value, the next call, if any, replaces the
value by an "Expecting at most one JSON value" error.  Take care not
to leak values or errors that aren't saved.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-44-armbru@redhat.com>

2a4794ba

json: Improve names of lexer states related to numbers · 4d400661

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-43-armbru@redhat.com>

4d400661

json: Replace %I64d, %I64u by %PRId64, %PRIu64 · 53a0d616

由 Markus Armbruster 提交于 8月 23, 2018

Support for %I64d got added in commit 2c0d4b36 "json: fix PRId64 on
Win32".  We had to hard-code I64d because we used the lexer's finite
state machine to check interpolations.  No more, so clean this up.

Additional conversion specifications would be easy enough to implement
when needed.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-42-armbru@redhat.com>

53a0d616

json: Leave rejecting invalid interpolation to parser · f7617d45

由 Markus Armbruster 提交于 8月 23, 2018

Both lexer and parser reject invalid interpolation specifications.
The parser's check is useless.

The lexer ends the token right after the first bad character.  This
tends to lead to suboptimal error reporting.  For instance, input

    [ %04d ]

produces the tokens

    JSON_LSQUARE  [
    JSON_ERROR    %0
    JSON_INTEGER  4
    JSON_KEYWORD  d
    JSON_RSQUARE  ]

The parser then yields an error, an object and two more errors:

    error: Invalid JSON syntax
    object: 4
    error: JSON parse error, invalid keyword
    error: JSON parse error, expecting value

Dumb down the lexer to accept [A-Za-z0-9]*.  The parser's check is now
used.  Emit a proper error there.

The lexer now produces

    JSON_LSQUARE  [
    JSON_INTERP   %04d
    JSON_RSQUARE  ]

and the parser reports just

    JSON parse error, invalid interpolation '%04d'
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-41-armbru@redhat.com>

f7617d45

json: Pass lexical errors and limit violations to callback · 84a56f38

由 Markus Armbruster 提交于 8月 23, 2018

The callback to consume JSON values takes QObject *json, Error *err.
If both are null, the callback is supposed to make up an error by
itself.  This sucks.

qjson.c's consume_json() neglects to do so, which makes
qobject_from_json() null instead of failing.  I consider that a bug.

The culprit is json_message_process_token(): it passes two null
pointers when it runs into a lexical error or a limit violation.  Fix
it to pass a proper Error object then.  Update the callbacks:

* monitor.c's handle_qmp_command(): the code to make up an error is
  now dead, drop it.

* qga/main.c's process_event(): lumps the "both null" case together
  with the "not a JSON object" case.  The former is now gone.  The
  error message "Invalid JSON syntax" is misleading for the latter.
  Improve it to "Input must be a JSON object".

* qobject/qjson.c's consume_json(): no update; check-qjson
  demonstrates qobject_from_json() now sets an error on lexical
  errors, but still doesn't on some other errors.

* tests/libqtest.c's qmp_response(): the Error object is now reliable,
  so use it to improve the error message.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-40-armbru@redhat.com>

84a56f38

json: Treat unwanted interpolation as lexical error · 2cbd15aa

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser optionally supports interpolation.  The lexer
recognizes interpolation tokens unconditionally.  The parser rejects
them when interpolation is disabled, in parse_interpolation().
However, it neglects to set an error then, which can make
json_parser_parse() fail without setting an error.

Move the check for unwanted interpolation from the parser's
parse_interpolation() into the lexer's finite state machine.  When
interpolation is disabled, '%' is now handled like any other
unexpected character.

The next commit will improve how such lexical errors are handled.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-39-armbru@redhat.com>

2cbd15aa

json: Rename token JSON_ESCAPE & friends to JSON_INTERP · 61030280

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser optionally supports interpolation.  The code calls it
"escape".  Awkward, because it uses the same term for escape sequences
within strings.  The latter usage is consistent with RFC 8259 "The
JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
Call the former "interpolation" instead.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-38-armbru@redhat.com>

61030280

json: Don't create JSON_ERROR tokens that won't be used · 269e57ae

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-37-armbru@redhat.com>

269e57ae

json: Don't pass null @tokens to json_parser_parse() · ff281a27

由 Markus Armbruster 提交于 8月 23, 2018

json_parser_parse() normally returns the QObject on success.  Except
it returns null when its @tokens argument is null.

Its only caller json_message_process_token() passes null @tokens when
emitting a lexical error.  The call is a rather opaque way to say json
= NULL then.

Simplify matters by lifting the assignment to json out of the emit
path: initialize json to null, set it to the value of
json_parser_parse() when there's no lexical error.  Drop the special
case from json_parser_parse().
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-36-armbru@redhat.com>

ff281a27

json: Redesign the callback to consume JSON values · 62815d85

由 Markus Armbruster 提交于 8月 23, 2018

The classical way to structure parser and lexer is to have the client
call the parser to get an abstract syntax tree, the parser call the
lexer to get the next token, and the lexer call some function to get
input characters.

Another way to structure them would be to have the client feed
characters to the lexer, the lexer feed tokens to the parser, and the
parser feed abstract syntax trees to some callback provided by the
client.  This way is more easily integrated into an event loop that
dispatches input characters as they arrive.

Our JSON parser is kind of between the two.  The lexer feeds tokens to
a "streamer" instead of a real parser.  The streamer accumulates
tokens until it got the sequence of tokens that comprise a single JSON
value (it counts curly braces and square brackets to decide).  It
feeds those token sequences to a callback provided by the client.  The
callback passes each token sequence to the parser, and gets back an
abstract syntax tree.

I figure it was done that way to make a straightforward recursive
descent parser possible.  "Get next token" becomes "pop the first
token off the token sequence".  Drawback: we need to store a complete
token sequence.  Each token eats 13 + input characters + malloc
overhead bytes.

Observations:

1. This is not the only way to use recursive descent.  If we replaced
   "get next token" by a coroutine yield, we could do without a
   streamer.

2. The lexer reports errors by passing a JSON_ERROR token to the
   streamer.  This communicates the offending input characters and
   their location, but no more.

3. The streamer reports errors by passing a null token sequence to the
   callback.  The (already poor) lexical error information is thrown
   away.

4. Having the callback receive a token sequence duplicates the code to
   convert token sequence to abstract syntax tree in every callback.

5. Known bug: the streamer silently drops incomplete token sequences.

This commit rectifies 4. by lifting the call of the parser from the
callbacks into the streamer.  Later commits will address 3. and 5.

The lifting removes a bug from qjson.c's parse_json(): it passed a
pointer to a non-null Error * in certain cases, as demonstrated by
check-qjson.c.

json_parser_parse() is now unused.  It's a stupid wrapper around
json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
to json_parser_parse().
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-35-armbru@redhat.com>

62815d85

json: Have lexer call streamer directly · 037f2440

由 Markus Armbruster 提交于 8月 23, 2018

json_lexer_init() takes the function to process a token as an
argument.  It's always json_message_process_token().  Makes the code
harder to understand for no actual gain.  Drop the indirection.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-34-armbru@redhat.com>

037f2440

json-parser: simplify and avoid JSONParserContext allocation · e8b19d7d

由 Marc-André Lureau 提交于 8月 23, 2018

parser_context_new/free() are only used from json_parser_parse(). We
can fold the code there and avoid an allocation altogether.
Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-33-armbru@redhat.com>

e8b19d7d

json: remove useless return value from lexer/parser · 7c1e1d54

由 Marc-André Lureau 提交于 8月 23, 2018

The lexer always returns 0 when char feeding. Furthermore, none of the
caller care about the return value.
Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <20180823164025.12553-32-armbru@redhat.com>

7c1e1d54

check-qjson: Fix and enable utf8_string()'s disabled part · c473c379

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-31-armbru@redhat.com>

c473c379

json: Fix \uXXXX for surrogate pairs · dc45a07c

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser treats each half of a surrogate pair as unpaired
surrogate.  Fix it to recognize surrogate pairs.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-30-armbru@redhat.com>

dc45a07c

json: Reject invalid \uXXXX, fix \u0000 · 46a628b1

由 Markus Armbruster 提交于 8月 23, 2018

The JSON parser translates invalid \uXXXX to garbage instead of
rejecting it, and swallows \u0000.

Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8().

Valid surrogate pairs are now differently broken: they're rejected
instead of translated to garbage.  The next commit will fix them.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-29-armbru@redhat.com>

46a628b1

json: Simplify parse_string() · de6decfe

由 Markus Armbruster 提交于 8月 23, 2018

Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-28-armbru@redhat.com>

de6decfe

json: Leave rejecting invalid escape sequences to parser · b2da4a4d

由 Markus Armbruster 提交于 8月 23, 2018

Both lexer and parser reject invalid escape sequences in strings.  The
parser's check is useless.

The lexer ends the token right after the first non-well-formed byte.
This tends to lead to suboptimal error reporting.  For instance, input

    {"abc\@ijk": 1}

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\@
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports three errors

    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

Drop the lexer's escape sequence checking, and make it accept the same
characters after backslash it accepts elsewhere in strings.  It now
produces

    JSON_LCURLY   {
    JSON_STRING   "abc\@ijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid escape sequence in string

While there, fix parse_string()'s inaccurate function comment.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-27-armbru@redhat.com>

b2da4a4d

json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") · 4b1c0cd7

由 Markus Armbruster 提交于 8月 23, 2018

Since the JSON grammer doesn't accept U+0000 anywhere, this merely
exchanges one kind of parse error for another.  It's purely for
consistency with qobject_to_json(), which accepts \xC0\x80 (see commit
e2ec3f97).
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-26-armbru@redhat.com>

4b1c0cd7

json: Leave rejecting invalid UTF-8 to parser · de930f45

由 Markus Armbruster 提交于 8月 23, 2018

Both the lexer and the parser (attempt to) validate UTF-8 in JSON
strings.

The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1,
\xF5..\xFF.  This rejects some, but not all invalid UTF-8.  It also
rejects ASCII control characters \x00..\x1F, in accordance with RFC
8259 (see recent commit "json: Reject unescaped control characters").

When the lexer rejects, it ends the token right after the first bad
byte.  Good when the bad byte is a newline.  Not so good when it's
something like an overlong sequence in the middle of a string.  For
instance, input

    {"abc\xC0\xAFijk": 1}\n

produces the tokens

    JSON_LCURLY   {
    JSON_ERROR    "abc\xC0
    JSON_ERROR    \xAF
    JSON_KEYWORD  ijk
    JSON_ERROR   ": 1}\n

The parser then reports four errors

    Invalid JSON syntax
    Invalid JSON syntax
    JSON parse error, invalid keyword 'ijk'
    Invalid JSON syntax

before it recovers at the newline.

The commit before previous made the parser reject invalid UTF-8
sequences.  Since then, anything the lexer rejects, the parser would
reject as well.  Thus, the lexer's rejecting is unnecessary for
correctness, and harmful for error reporting.

However, we want to keep rejecting ASCII control characters in the
lexer, because that produces the behavior we want for unclosed
strings.

We also need to keep rejecting \xFF in the lexer, because we
documented that as a way to reset the JSON parser
(docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
means we can't change how we recover from this error now.  I wish we
hadn't done that.

I think we should treat \xFE the same as \xFF.

Change the lexer to accept \xC0..\xC1 and \xF5..\xFD.  It now rejects
only \x00..\x1F and \xFE..\xFF.  Error reporting for invalid UTF-8 in
strings is much improved, except for \xFE and \xFF.  For the example
above, the lexer now produces

    JSON_LCURLY   {
    JSON_STRING   "abc\xC0\xAFijk"
    JSON_COLON    :
    JSON_INTEGER  1
    JSON_RCURLY

and the parser reports just

    JSON parse error, invalid UTF-8 sequence in string
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-25-armbru@redhat.com>

de930f45

json: Report first rather than last parse error · 574bf16f

由 Markus Armbruster 提交于 8月 23, 2018

Quiz time!  When a parser reports multiple errors, but the user gets
to see just one, which one is (on average) the least useful one?

Yes, you're right, it's the last one!  You're clearly familiar with
compilers.

Which one does QEMU report?

Right again, the last one!  You're clearly familiar with QEMU.

Reproducer: feeding

    {"abc\xC2ijk": 1}\n

to QMP produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}}

Report the first error instead.  The reproducer now produces

    {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}}
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Message-Id: <20180823164025.12553-24-armbru@redhat.com>

574bf16f