Since the JSON grammer doesn't accept U+0000 anywhere, this merely
exchanges one kind of parse error for another. It's purely for
consistency with qobject_to_json(), which accepts \xC0\x80 (see commit
e2ec3f97680).
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <
20180823164025.12553-26-armbru@redhat.com>
* interpolation = %((l|ll|I64)[du]|[ipsf])
*
* Note:
- * - Input must be encoded in UTF-8.
+ * - Input must be encoded in modified UTF-8.
* - Decoding and validating is left to the parser.
*/
}
} else {
cp = mod_utf8_codepoint(ptr, 6, &end);
- if (cp <= 0) {
+ if (cp < 0) {
parse_error(ctxt, token, "invalid UTF-8 sequence in string");
goto out;
}
static void utf8_string(void)
{
/*
- * Problem: we can't easily deal with embedded U+0000. Parsing
- * the JSON string "this \\u0000" is fun" yields "this \0 is fun",
- * which gets misinterpreted as NUL-terminated "this ". We should
- * consider using overlong encoding \xC0\x80 for U+0000 ("modified
- * UTF-8").
- *
* Most test cases are scraped from Markus Kuhn's UTF-8 decoder
* capability and stress test at
* http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt
{
/* \U+0000 */
"\xC0\x80",
- NULL,
+ "\xC0\x80",
"\\u0000",
},
{