NOTE: This is my first parser and I pushed it to Github just to showcase the project. It's not production ready in any sense.
This is a small project showcasing the parsing process of JSON syntax and how parser can turn JSON syntax into the object that Javascript can read.
Parser was build based on this document.
This parser can parse the following types:
stringnumberobjectarraytrue( In parser this is calledbool)false( In parser this is calledbool)null
Stages will be showcased based on the following JSON syntax:
{
"key": "value"
}Tokenization is a first process executed here. Tokenization is spliting the content ( string type ) into multiple different characters that are then identified using different identifiers. List of identifiers are:
WHITESPACEOBJECT_STARTOBJECT_ENDSTRING_STARTSTRING_CONTENTSTRING_ENDCOLONNUMBERCOMMADOTARRAY_STARTARRAY_ENDBOOLUNKNOWNNULL
These identifiers are used to identify different characters. In this case the tokenization output would be:
[
{
"start": 0,
"end": 1,
"raw": "{",
"identifier": "OBJECT_START"
},
{
"start": 1,
"end": 2,
"raw": "\r",
"identifier": "WHITESPACE",
"child": true
},
...WHITESPACE
{
"start": 7,
"end": 8,
"raw": "\"",
"identifier": "STRING_START",
"child": true
},
...STRING_CONTENT
{
"start": 10,
"end": 11,
"raw": "y",
"identifier": "STRING_CONTENT",
"child": true
},
{
"start": 11,
"end": 12,
"raw": "\"",
"identifier": "STRING_END",
"child": true
},
{
"start": 12,
"end": 13,
"raw": ":",
"identifier": "COLON",
"child": true
},
{
"start": 13,
"end": 14,
"raw": " ",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 14,
"end": 15,
"raw": "\"",
"identifier": "STRING_START",
"child": true
},
...STRING_CONTENT
{
"start": 19,
"end": 20,
"raw": "e",
"identifier": "STRING_CONTENT",
"child": true
},
{
"start": 20,
"end": 21,
"raw": "\"",
"identifier": "STRING_END",
"child": true
},
{
"start": 21,
"end": 22,
"raw": "\r",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 22,
"end": 23,
"raw": "\n",
"identifier": "WHITESPACE",
"child": true
},
{
"start": 23,
"end": 24,
"raw": "}",
"identifier": "OBJECT_END"
}
]Lexing is a second process and it's fed using Token[] type generated from tokenization process. Lexing returns Entity[] type that contains different entities. Entity in this case resembles connected tokens that produce one type. On example, if STRING_START, STRING_CONTENT and STRING_END are registered as tokens, entity will generate string type.
Entities can be of type:
stringobjectnumberarrayboolnullwhitespacecoloncommaunknown
In this case the lexer output would be:
[
{
"type": "object",
"children": [
{
"type": "whitespace",
"value": "\r\n "
},
{
"type": "whitespace",
"value": "\n "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "whitespace",
"value": " "
},
{
"type": "string",
"value": "key"
},
{
"type": "colon",
"value": ":"
},
{
"type": "whitespace",
"value": " "
},
{
"type": "string",
"value": "value"
},
{
"type": "whitespace",
"value": "\r\n"
},
{
"type": "whitespace",
"value": "\n"
}
]
}
]AST generation is a third process that is fed with Entity[] type and generates ASTNode[] type that can finally be recognized by the final process.
One AST node can be of type:
objectstringarraynumberbool(true|false)null
And have different optional data fields like:
children( only for parent-based types ( in this casearrayandobject)keyvalue
While type field is required and is always defined.
In this case the AST output would be:
[
{
"type": "object",
"children": [
{
"type": "string",
"value": "value",
"key": "key"
}
]
}
]The fourth and final process is parsing the ASTNode[] and converting node types to Javascript native types.
The output is:
{ key: 'value' }You can now access key field inside this object.
First install all packages using npm i and run npm start.
Parses JSON from a file.
Args:
- path ->
string- Defines a path of the JSON file
Parses JSON from a string.
Args:
- content ->
string- Defines a JSON syntax based string
To access different processes you can import Tokenizer, Lexer, AST and Parser classes. Each of them have parse method ( except AST which has construct ) that accept argument of previous process result.
To visualize every process better, use JSON.stringify(<object>, null, 2).
