Parser Generators
Overview of parsers generators and how to use them.
Parser Generators
Contents:
- LR Parsers
- Bison
- Tree-sitter
LR Parsers
- L - the parsers reads the text only once, without backing up
- R - rightmost derivation
- bottom-up parser
- LR(1) - 1 - the number of look-ahead symbols necessary to parse the grammar
Rightmost derivation
- The tree created for an expression would replace the rightmost non-terminal at each step
Bottom-up
The tree is constructed from the bottom - from the leaves to the root.
Algorithm
The parser:
keeps a stack of the parsed sub-trees so far
runs in a loop over the input and on each iteration it does either a shift or a reduce action.
- shift - push the token on the stack
- reduce - pop N subtrees from the stack, make a new tree that has them as children and push it to the stack
Ring any bells?
Ring any bells?
Same as the shunting yard algorithm, but:
- without the operator precedence
- it replaced by the grammar structure
- There all the rules had just two children
To shift or to reduce?
- Depends on the grammar rules.
- The algorithm uses a Finite State Automaton (FSA)
To shift or to reduce?
The FSA has for each (state, next token) -> (action, next state)
- action - whether to shift or reduce
Bison
- Bison generates LR(1) and GLR parsers
- GLR parsers can have arbitrary lookahead and be ambigous
- You don’t always have to convert the grammar in an LR form
- operator precedence and assiociativity are handled for you
Input
%{
/* verbatim copy */
%}
/* definitions */
%%
/* rules */
%%
/* verbatim copy */
Parsing expressions
https://github.com/SofiaCPP/IPL/blob/master/demo/bison/calc/parser.y
Understanding the parser
https://www.gnu.org/software/bison/manual/html_node/Debugging.html#Debugging
bison output for the expression grammar
Tree-sitter
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited
Tree-sitter goals
- General enough to parse any programming language
- Fast enough to parse on every keystroke in a text editor
- Robust enough to provide useful results even in the presence of syntax errors
- Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application