1. 25 8月, 2018 40 次提交
    • M
      qjson: Have qobject_from_json() & friends reject empty and blank · dd98e848
      Markus Armbruster 提交于
      The last case where qobject_from_json() & friends return null without
      setting an error is empty or blank input.  Callers:
      
      * block.c's parse_json_protocol() reports "Could not parse the JSON
        options".  It's marked as a work-around, because it also covered
        actual bugs, but they got fixed in the previous few commits.
      
      * qobject_input_visitor_new_str() reports "JSON parse error".  Also
        marked as work-around.  The recent fixes have made this unreachable,
        because it currently gets called only for input starting with '{'.
      
      * check-qjson.c's empty_input() and blank_input() demonstrate the
        behavior.
      
      * The other callers are not affected since they only pass input with
        exactly one JSON value or, in the case of negative tests, one error.
      
      Fail with "Expecting a JSON value" instead of returning null, and
      simplify callers.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-48-armbru@redhat.com>
      dd98e848
    • M
      json: Assert json_parser_parse() consumes all tokens on success · 5d50113c
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-47-armbru@redhat.com>
      5d50113c
    • M
      json: Fix streamer not to ignore trailing unterminated structures · f9277915
      Markus Armbruster 提交于
      json_message_process_token() accumulates tokens until it got the
      sequence of tokens that comprise a single JSON value (it counts curly
      braces and square brackets to decide).  It feeds those token sequences
      to json_parser_parse().  If a non-empty sequence of tokens remains at
      the end of the parse, it's silently ignored.  check-qjson.c cases
      unterminated_array(), unterminated_array_comma(), unterminated_dict(),
      unterminated_dict_comma() demonstrate this bug.
      
      Fix as follows.  Introduce a JSON_END_OF_INPUT token.  When the
      streamer receives it, it feeds the accumulated tokens to
      json_parser_parse().
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-46-armbru@redhat.com>
      f9277915
    • M
      json: Fix latent parser aborts at end of input · e06d008a
      Markus Armbruster 提交于
      json-parser.c carefully reports end of input like this:
      
          token = parser_context_pop_token(ctxt);
          if (token == NULL) {
              parse_error(ctxt, NULL, "premature EOI");
              goto out;
          }
      
      Except parser_context_pop_token() can't return null, it fails its
      assertion instead.  Same for parser_context_peek_token().  Broken in
      commit 65c0f1e9, and faithfully preserved in commit 95385fe9.
      Only a latent bug, because the streamer throws away any input that
      could trigger it.
      
      Drop the assertions, so we can fix the streamer in the next commit.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-45-armbru@redhat.com>
      e06d008a
    • M
      qjson: Fix qobject_from_json() & friends for multiple values · 2a4794ba
      Markus Armbruster 提交于
      qobject_from_json() & friends use the consume_json() callback to
      receive either a value or an error from the parser.
      
      When they are fed a string that contains more than either one JSON
      value or one JSON syntax error, consume_json() gets called multiple
      times.
      
      When the last call receives a value, qobject_from_json() returns that
      value.  Any other values are leaked.
      
      When any call receives an error, qobject_from_json() sets the first
      error received.  Any other errors are thrown away.
      
      When values follow errors, qobject_from_json() returns both a value
      and sets an error.  That's bad.  Impact:
      
      * block.c's parse_json_protocol() ignores and leaks the value.  It's
        used to to parse pseudo-filenames starting with "json:".  The
        pseudo-filenames can come from the user or from image meta-data such
        as a QCOW2 image's backing file name.
      
      * vl.c's parse_display_qapi() ignores and leaks the error.  It's used
        to parse the argument of command line option -display.
      
      * vl.c's main() case QEMU_OPTION_blockdev ignores the error and leaves
        it in @err.  main() will then pass a pointer to a non-null Error *
        to net_init_clients(), which is forbidden.  It can lead to assertion
        failure or other misbehavior.
      
      * check-qjson.c's multiple_values() demonstrates the badness.
      
      * The other callers are not affected since they only pass strings with
        exactly one JSON value or, in the case of negative tests, one
        error.
      
      The impact on the _nofail() functions is relatively harmless.  They
      abort when any call receives an error.  Else they return the last
      value, and leak the others, if any.
      
      Fix consume_json() as follows.  On the first call, save value and
      error as before.  On subsequent calls, if any, don't save them.  If
      the first call saved a value, the next call, if any, replaces the
      value by an "Expecting at most one JSON value" error.  Take care not
      to leak values or errors that aren't saved.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-44-armbru@redhat.com>
      2a4794ba
    • M
      json: Improve names of lexer states related to numbers · 4d400661
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-43-armbru@redhat.com>
      4d400661
    • M
      json: Replace %I64d, %I64u by %PRId64, %PRIu64 · 53a0d616
      Markus Armbruster 提交于
      Support for %I64d got added in commit 2c0d4b36 "json: fix PRId64 on
      Win32".  We had to hard-code I64d because we used the lexer's finite
      state machine to check interpolations.  No more, so clean this up.
      
      Additional conversion specifications would be easy enough to implement
      when needed.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-42-armbru@redhat.com>
      53a0d616
    • M
      json: Leave rejecting invalid interpolation to parser · f7617d45
      Markus Armbruster 提交于
      Both lexer and parser reject invalid interpolation specifications.
      The parser's check is useless.
      
      The lexer ends the token right after the first bad character.  This
      tends to lead to suboptimal error reporting.  For instance, input
      
          [ %04d ]
      
      produces the tokens
      
          JSON_LSQUARE  [
          JSON_ERROR    %0
          JSON_INTEGER  4
          JSON_KEYWORD  d
          JSON_RSQUARE  ]
      
      The parser then yields an error, an object and two more errors:
      
          error: Invalid JSON syntax
          object: 4
          error: JSON parse error, invalid keyword
          error: JSON parse error, expecting value
      
      Dumb down the lexer to accept [A-Za-z0-9]*.  The parser's check is now
      used.  Emit a proper error there.
      
      The lexer now produces
      
          JSON_LSQUARE  [
          JSON_INTERP   %04d
          JSON_RSQUARE  ]
      
      and the parser reports just
      
          JSON parse error, invalid interpolation '%04d'
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-41-armbru@redhat.com>
      f7617d45
    • M
      json: Pass lexical errors and limit violations to callback · 84a56f38
      Markus Armbruster 提交于
      The callback to consume JSON values takes QObject *json, Error *err.
      If both are null, the callback is supposed to make up an error by
      itself.  This sucks.
      
      qjson.c's consume_json() neglects to do so, which makes
      qobject_from_json() null instead of failing.  I consider that a bug.
      
      The culprit is json_message_process_token(): it passes two null
      pointers when it runs into a lexical error or a limit violation.  Fix
      it to pass a proper Error object then.  Update the callbacks:
      
      * monitor.c's handle_qmp_command(): the code to make up an error is
        now dead, drop it.
      
      * qga/main.c's process_event(): lumps the "both null" case together
        with the "not a JSON object" case.  The former is now gone.  The
        error message "Invalid JSON syntax" is misleading for the latter.
        Improve it to "Input must be a JSON object".
      
      * qobject/qjson.c's consume_json(): no update; check-qjson
        demonstrates qobject_from_json() now sets an error on lexical
        errors, but still doesn't on some other errors.
      
      * tests/libqtest.c's qmp_response(): the Error object is now reliable,
        so use it to improve the error message.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-40-armbru@redhat.com>
      84a56f38
    • M
      json: Treat unwanted interpolation as lexical error · 2cbd15aa
      Markus Armbruster 提交于
      The JSON parser optionally supports interpolation.  The lexer
      recognizes interpolation tokens unconditionally.  The parser rejects
      them when interpolation is disabled, in parse_interpolation().
      However, it neglects to set an error then, which can make
      json_parser_parse() fail without setting an error.
      
      Move the check for unwanted interpolation from the parser's
      parse_interpolation() into the lexer's finite state machine.  When
      interpolation is disabled, '%' is now handled like any other
      unexpected character.
      
      The next commit will improve how such lexical errors are handled.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-39-armbru@redhat.com>
      2cbd15aa
    • M
      json: Rename token JSON_ESCAPE & friends to JSON_INTERP · 61030280
      Markus Armbruster 提交于
      The JSON parser optionally supports interpolation.  The code calls it
      "escape".  Awkward, because it uses the same term for escape sequences
      within strings.  The latter usage is consistent with RFC 8259 "The
      JavaScript Object Notation (JSON) Data Interchange Format" and ISO C.
      Call the former "interpolation" instead.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-38-armbru@redhat.com>
      61030280
    • M
      json: Don't create JSON_ERROR tokens that won't be used · 269e57ae
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-37-armbru@redhat.com>
      269e57ae
    • M
      json: Don't pass null @tokens to json_parser_parse() · ff281a27
      Markus Armbruster 提交于
      json_parser_parse() normally returns the QObject on success.  Except
      it returns null when its @tokens argument is null.
      
      Its only caller json_message_process_token() passes null @tokens when
      emitting a lexical error.  The call is a rather opaque way to say json
      = NULL then.
      
      Simplify matters by lifting the assignment to json out of the emit
      path: initialize json to null, set it to the value of
      json_parser_parse() when there's no lexical error.  Drop the special
      case from json_parser_parse().
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-36-armbru@redhat.com>
      ff281a27
    • M
      json: Redesign the callback to consume JSON values · 62815d85
      Markus Armbruster 提交于
      The classical way to structure parser and lexer is to have the client
      call the parser to get an abstract syntax tree, the parser call the
      lexer to get the next token, and the lexer call some function to get
      input characters.
      
      Another way to structure them would be to have the client feed
      characters to the lexer, the lexer feed tokens to the parser, and the
      parser feed abstract syntax trees to some callback provided by the
      client.  This way is more easily integrated into an event loop that
      dispatches input characters as they arrive.
      
      Our JSON parser is kind of between the two.  The lexer feeds tokens to
      a "streamer" instead of a real parser.  The streamer accumulates
      tokens until it got the sequence of tokens that comprise a single JSON
      value (it counts curly braces and square brackets to decide).  It
      feeds those token sequences to a callback provided by the client.  The
      callback passes each token sequence to the parser, and gets back an
      abstract syntax tree.
      
      I figure it was done that way to make a straightforward recursive
      descent parser possible.  "Get next token" becomes "pop the first
      token off the token sequence".  Drawback: we need to store a complete
      token sequence.  Each token eats 13 + input characters + malloc
      overhead bytes.
      
      Observations:
      
      1. This is not the only way to use recursive descent.  If we replaced
         "get next token" by a coroutine yield, we could do without a
         streamer.
      
      2. The lexer reports errors by passing a JSON_ERROR token to the
         streamer.  This communicates the offending input characters and
         their location, but no more.
      
      3. The streamer reports errors by passing a null token sequence to the
         callback.  The (already poor) lexical error information is thrown
         away.
      
      4. Having the callback receive a token sequence duplicates the code to
         convert token sequence to abstract syntax tree in every callback.
      
      5. Known bug: the streamer silently drops incomplete token sequences.
      
      This commit rectifies 4. by lifting the call of the parser from the
      callbacks into the streamer.  Later commits will address 3. and 5.
      
      The lifting removes a bug from qjson.c's parse_json(): it passed a
      pointer to a non-null Error * in certain cases, as demonstrated by
      check-qjson.c.
      
      json_parser_parse() is now unused.  It's a stupid wrapper around
      json_parser_parse_err().  Drop it, and rename json_parser_parse_err()
      to json_parser_parse().
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-35-armbru@redhat.com>
      62815d85
    • M
      json: Have lexer call streamer directly · 037f2440
      Markus Armbruster 提交于
      json_lexer_init() takes the function to process a token as an
      argument.  It's always json_message_process_token().  Makes the code
      harder to understand for no actual gain.  Drop the indirection.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-34-armbru@redhat.com>
      037f2440
    • M
      json-parser: simplify and avoid JSONParserContext allocation · e8b19d7d
      Marc-André Lureau 提交于
      parser_context_new/free() are only used from json_parser_parse(). We
      can fold the code there and avoid an allocation altogether.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180719184111.5129-9-marcandre.lureau@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20180823164025.12553-33-armbru@redhat.com>
      e8b19d7d
    • M
      json: remove useless return value from lexer/parser · 7c1e1d54
      Marc-André Lureau 提交于
      The lexer always returns 0 when char feeding. Furthermore, none of the
      caller care about the return value.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180326150916.9602-10-marcandre.lureau@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20180823164025.12553-32-armbru@redhat.com>
      7c1e1d54
    • M
      check-qjson: Fix and enable utf8_string()'s disabled part · c473c379
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-31-armbru@redhat.com>
      c473c379
    • M
      json: Fix \uXXXX for surrogate pairs · dc45a07c
      Markus Armbruster 提交于
      The JSON parser treats each half of a surrogate pair as unpaired
      surrogate.  Fix it to recognize surrogate pairs.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-30-armbru@redhat.com>
      dc45a07c
    • M
      json: Reject invalid \uXXXX, fix \u0000 · 46a628b1
      Markus Armbruster 提交于
      The JSON parser translates invalid \uXXXX to garbage instead of
      rejecting it, and swallows \u0000.
      
      Fix by using mod_utf8_encode() instead of flawed wchar_to_utf8().
      
      Valid surrogate pairs are now differently broken: they're rejected
      instead of translated to garbage.  The next commit will fix them.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-29-armbru@redhat.com>
      46a628b1
    • M
      json: Simplify parse_string() · de6decfe
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-28-armbru@redhat.com>
      de6decfe
    • M
      json: Leave rejecting invalid escape sequences to parser · b2da4a4d
      Markus Armbruster 提交于
      Both lexer and parser reject invalid escape sequences in strings.  The
      parser's check is useless.
      
      The lexer ends the token right after the first non-well-formed byte.
      This tends to lead to suboptimal error reporting.  For instance, input
      
          {"abc\@ijk": 1}
      
      produces the tokens
      
          JSON_LCURLY   {
          JSON_ERROR    "abc\@
          JSON_KEYWORD  ijk
          JSON_ERROR   ": 1}\n
      
      The parser then reports three errors
      
          Invalid JSON syntax
          JSON parse error, invalid keyword 'ijk'
          Invalid JSON syntax
      
      before it recovers at the newline.
      
      Drop the lexer's escape sequence checking, and make it accept the same
      characters after backslash it accepts elsewhere in strings.  It now
      produces
      
          JSON_LCURLY   {
          JSON_STRING   "abc\@ijk"
          JSON_COLON    :
          JSON_INTEGER  1
          JSON_RCURLY
      
      and the parser reports just
      
          JSON parse error, invalid escape sequence in string
      
      While there, fix parse_string()'s inaccurate function comment.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-27-armbru@redhat.com>
      b2da4a4d
    • M
      json: Accept overlong \xC0\x80 as U+0000 ("modified UTF-8") · 4b1c0cd7
      Markus Armbruster 提交于
      Since the JSON grammer doesn't accept U+0000 anywhere, this merely
      exchanges one kind of parse error for another.  It's purely for
      consistency with qobject_to_json(), which accepts \xC0\x80 (see commit
      e2ec3f97).
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-26-armbru@redhat.com>
      4b1c0cd7
    • M
      json: Leave rejecting invalid UTF-8 to parser · de930f45
      Markus Armbruster 提交于
      Both the lexer and the parser (attempt to) validate UTF-8 in JSON
      strings.
      
      The lexer rejects bytes that can't occur in valid UTF-8: \xC0..\xC1,
      \xF5..\xFF.  This rejects some, but not all invalid UTF-8.  It also
      rejects ASCII control characters \x00..\x1F, in accordance with RFC
      8259 (see recent commit "json: Reject unescaped control characters").
      
      When the lexer rejects, it ends the token right after the first bad
      byte.  Good when the bad byte is a newline.  Not so good when it's
      something like an overlong sequence in the middle of a string.  For
      instance, input
      
          {"abc\xC0\xAFijk": 1}\n
      
      produces the tokens
      
          JSON_LCURLY   {
          JSON_ERROR    "abc\xC0
          JSON_ERROR    \xAF
          JSON_KEYWORD  ijk
          JSON_ERROR   ": 1}\n
      
      The parser then reports four errors
      
          Invalid JSON syntax
          Invalid JSON syntax
          JSON parse error, invalid keyword 'ijk'
          Invalid JSON syntax
      
      before it recovers at the newline.
      
      The commit before previous made the parser reject invalid UTF-8
      sequences.  Since then, anything the lexer rejects, the parser would
      reject as well.  Thus, the lexer's rejecting is unnecessary for
      correctness, and harmful for error reporting.
      
      However, we want to keep rejecting ASCII control characters in the
      lexer, because that produces the behavior we want for unclosed
      strings.
      
      We also need to keep rejecting \xFF in the lexer, because we
      documented that as a way to reset the JSON parser
      (docs/interop/qmp-spec.txt section 2.6 QGA Synchronization), which
      means we can't change how we recover from this error now.  I wish we
      hadn't done that.
      
      I think we should treat \xFE the same as \xFF.
      
      Change the lexer to accept \xC0..\xC1 and \xF5..\xFD.  It now rejects
      only \x00..\x1F and \xFE..\xFF.  Error reporting for invalid UTF-8 in
      strings is much improved, except for \xFE and \xFF.  For the example
      above, the lexer now produces
      
          JSON_LCURLY   {
          JSON_STRING   "abc\xC0\xAFijk"
          JSON_COLON    :
          JSON_INTEGER  1
          JSON_RCURLY
      
      and the parser reports just
      
          JSON parse error, invalid UTF-8 sequence in string
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-25-armbru@redhat.com>
      de930f45
    • M
      json: Report first rather than last parse error · 574bf16f
      Markus Armbruster 提交于
      Quiz time!  When a parser reports multiple errors, but the user gets
      to see just one, which one is (on average) the least useful one?
      
      Yes, you're right, it's the last one!  You're clearly familiar with
      compilers.
      
      Which one does QEMU report?
      
      Right again, the last one!  You're clearly familiar with QEMU.
      
      Reproducer: feeding
      
          {"abc\xC2ijk": 1}\n
      
      to QMP produces
      
          {"error": {"class": "GenericError", "desc": "JSON parse error, key is not a string in object"}}
      
      Report the first error instead.  The reproducer now produces
      
          {"error": {"class": "GenericError", "desc": "JSON parse error, invalid UTF-8 sequence in string"}}
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-24-armbru@redhat.com>
      574bf16f
    • M
      json: Reject invalid UTF-8 sequences · e59f39d4
      Markus Armbruster 提交于
      We reject bytes that can't occur in valid UTF-8 (\xC0..\xC1,
      \xF5..\xFF in the lexer.  That's insufficient; there's plenty of
      invalid UTF-8 not containing these bytes, as demonstrated by
      check-qjson:
      
      * Malformed sequences
      
        - Unexpected continuation bytes
      
        - Missing continuation bytes after start bytes other than
          \xC0..\xC1, \xF5..\xFD.
      
      * Overlong sequences with start bytes other than \xC0..\xC1,
        \xF5..\xFD.
      
      * Invalid code points
      
      Fixing this in the lexer would be bothersome.  Fixing it in the parser
      is straightforward, so do that.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-23-armbru@redhat.com>
      e59f39d4
    • M
      check-qjson: Document we expect invalid UTF-8 to be rejected · a89d3104
      Markus Armbruster 提交于
      The JSON parser rejects some invalid sequences, but accepts others
      without correcting the problem.
      
      We should either reject all invalid sequences, or minimize overlong
      sequences and replace all other invalid sequences by a suitable
      replacement character.  A common choice for replacement is U+FFFD.
      
      I'm going to implement the former.  Update the comments in
      utf8_string() to expect this.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-22-armbru@redhat.com>
      a89d3104
    • M
      json: Tighten and simplify qstring_from_escaped_str()'s loop · 00ea57fa
      Markus Armbruster 提交于
      Simplify loop control, and assert that the string ends with the
      appropriate quote (the lexer ensures it does).
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-21-armbru@redhat.com>
      00ea57fa
    • M
      json: Revamp lexer documentation · eddc0a7f
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-20-armbru@redhat.com>
      eddc0a7f
    • M
      json: Reject unescaped control characters · 340db1ed
      Markus Armbruster 提交于
      Fix the lexer to reject unescaped control characters in JSON strings,
      in accordance with RFC 8259 "The JavaScript Object Notation (JSON)
      Data Interchange Format".
      
      Bonus: we now recover more nicely from unclosed strings.  E.g.
      
          {"one: 1}\n{"two": 2}
      
      now recovers cleanly after the newline, where before the lexer
      remained confused until the next unpaired double quote or lexical
      error.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-19-armbru@redhat.com>
      340db1ed
    • M
      json: Fix lexer to include the bad character in JSON_ERROR token · a2ec6be7
      Markus Armbruster 提交于
      json_lexer[] maps (lexer state, input character) to the new lexer
      state.  The input character is consumed unless the new state is
      terminal and the input character doesn't belong to this token,
      i.e. the state transition uses look-ahead.  When this is the case,
      input character '\0' would result in the same state transition.
      TERMINAL_NEEDED_LOOKAHEAD() exploits this.
      
      Except this is wrong for transitions to IN_ERROR.  There, the
      offending input character is in fact consumed: case IN_ERROR returns.
      It isn't added to the JSON_ERROR token, though.
      
      Fix that by making TERMINAL_NEEDED_LOOKAHEAD() return false for
      transitions to IN_ERROR.
      
      There's a slight complication.  json_lexer_flush() passes input
      character '\0' to flush an incomplete token.  If this results in
      JSON_ERROR, we'd now add the '\0' to the token.  Suppress that.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-18-armbru@redhat.com>
      a2ec6be7
    • M
      check-qjson: Cover interpolation more thoroughly · 2e933f57
      Markus Armbruster 提交于
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-17-armbru@redhat.com>
      2e933f57
    • M
      check-qjson qmp-test: Cover control characters more thoroughly · 6bc93a34
      Markus Armbruster 提交于
      RFC 8259 "The JavaScript Object Notation (JSON) Data Interchange
      Format" requires control characters in strings to be escaped.
      Demonstrate the JSON parser accepts U+0001 .. U+001F unescaped.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-16-armbru@redhat.com>
      6bc93a34
    • M
      check-qjson: Fix utf8_string() to test all invalid sequences · 5f454e66
      Markus Armbruster 提交于
      Some of utf8_string()'s test_cases[] contain multiple invalid
      sequences.  Testing that qobject_from_json() fails only tests we
      reject at least one invalid sequence.  That's incomplete.
      
      Additionally test each non-space sequence in isolation.
      
      This demonstrates that the JSON parser accepts invalid sequences
      starting with \xC2..\xF4.  Add a FIXME comment.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-15-armbru@redhat.com>
      5f454e66
    • M
      check-qjson: Simplify utf8_string() · 32846e93
      Markus Armbruster 提交于
      The previous commit made utf8_string()'s test_cases[].utf8_in
      superfluous: we can use .json_in instead.  Except for the case testing
      U+0000.  \x00 doesn't work in C strings, so it tests \\u0000 instead.
      But testing \\uXXXX is escaped_string()'s job.  It's covered there.
      Test U+0001 here, and drop .utf8_in.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-14-armbru@redhat.com>
      32846e93
    • M
      check-qjson: Cover UTF-8 in single quoted strings · 6ad8444f
      Markus Armbruster 提交于
      utf8_string() tests only double quoted strings.  Cover single quoted
      strings, too: store the strings to test without quotes, then wrap them
      in either kind of quote.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-13-armbru@redhat.com>
      6ad8444f
    • M
      check-qjson: Consolidate partly redundant string tests · 069946f4
      Markus Armbruster 提交于
      simple_string() and single_quote_string() have become redundant with
      escaped_string(), except for embedded single and double quotes.
      Replace them by a test that covers just that.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-12-armbru@redhat.com>
      069946f4
    • M
      check-qjson: Cover escaped characters more thoroughly, part 2 · e0fe2a97
      Markus Armbruster 提交于
      Cover escaped single quote, surrogates, invalid escapes, and
      noncharacters.  This demonstrates that valid surrogate pairs are
      misinterpreted, and invalid surrogates and noncharacters aren't
      rejected.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-11-armbru@redhat.com>
      e0fe2a97
    • M
      check-qjson: Streamline escaped_string()'s test strings · f3cfdd3a
      Markus Armbruster 提交于
      Merge a few closely related test strings, and drop a few redundant
      ones.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-10-armbru@redhat.com>
      f3cfdd3a
    • M
      check-qjson: Cover escaped characters more thoroughly, part 1 · 4e1df9b7
      Markus Armbruster 提交于
      escaped_string() first tests double quoted strings, then repeats a few
      tests with single quotes.  Repeat all of them: store the strings to
      test without quotes, and wrap them in either kind of quote for
      testing.
      Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: NEric Blake <eblake@redhat.com>
      Message-Id: <20180823164025.12553-9-armbru@redhat.com>
      4e1df9b7