The lexer fails to end a valid token when the lookahead character is
beyond '\x7F'. For instance, input
true\xC2\xA2
produces the tokens
JSON_ERROR true\xC2
JSON_ERROR \xA2
This should be
JSON_KEYWORD true
JSON_ERROR \xC2
JSON_ERROR \xA2
instead.
The culprit is
#define TERMINAL(state) [0 ... 0x7F] = (state)
It leaves [0x80..0xFF] zero, i.e. IN_ERROR. Has always been broken.
Fix it to initialize the complete array.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <
20180831075841.13363-2-armbru@redhat.com>
QEMU_BUILD_BUG_ON((int)JSON_MIN <= (int)IN_START_INTERP);
QEMU_BUILD_BUG_ON(IN_START_INTERP != IN_START + 1);
-#define TERMINAL(state) [0 ... 0x7F] = (state)
+#define TERMINAL(state) [0 ... 0xFF] = (state)
/* Return whether TERMINAL is a terminal state and the transition to it
from OLD_STATE required lookahead. This happens whenever the table