Goparsec: For recursive grammar
Input text, sometime, have recursive patterns. For example:
10 * ((20 + 30) * 40)
Let us try to build a plain text calculator, where input is an expression of arithmetic operations on numbers. Operations include addition, subtraction, multiplication and division, in which multiplication and division take higher precedence over addition and subtraction.
expr: term | expr '-' term | expr '+' term
term: factor | term '*' factor | term '/' factor
factor: NUMBER | '(' expr ')'
Above YACC grammar can parse the calculator text. Note that the grammar
definition involves several recursive elements: like, expr
is defined
using expr
. When we try to port this grammar to construct parser combinator
we will end up with infinite recursion . Instead we can re-work the grammar
in such a way that recursive definition of grammar, will quickly be resolved
by presence or absence of a terminal token. Here is an example:
expr -> sum
prod -> value (mulop value)*
mulop -> "*" | "/"
sum -> prod (addop prod)*
addop -> "+" | "-"
value -> num | "(" expr ")"
Above grammar also defines expr
using expr
albeit indirectly. We can
now proceed to construct our simple parsers that can parse terminal tokens.
var openparan = parsec.Token(`\(`, "OPENPARAN")
var closeparan = parsec.Token(`\)`, "CLOSEPARAN")
var addop = parsec.Token(`\+`, "ADD")
var subop = parsec.Token(`-`, "SUB")
var multop = parsec.Token(`\*`, "MULT")
var divop = parsec.Token(`/`, "DIV")
Let us do some trivial composition.
// addop -> "+" | "-"
var sumOp = parsec.OrdChoice(one2one, addop, subop)
// mulop -> "*" | "/"
var prodOp = parsec.OrdChoice(one2one, multop, divop)
Then comes a non-trivial composition, where three compositions value
,
sum
, and prod
refer each other. To handle situation like this
the standard combinators available in goparsec can compose with Parser
and/or *Parser
. This gives user the ability to forward declare
a parser and use its reference for composition, while the actual definition
for that parser can happen later.
// Forward declaration
var prod, sum, value parsec.Parser // circular rats
// value -> "(" expr ")"
var groupExpr = parsec.And(exprNode, openparan, &sum, closeparan)
// (addop prod)*
var prodK = parsec.Kleene(nil, parsec.And(many2many, sumOp, &prod), nil)
// (mulop value)*
var valueK = parsec.Kleene(nil, parsec.And(many2many, prodOp, &value), nil)
And finally we can tie them all together:
// Circular definitions come to life
// sum -> prod (addop prod)*
sum = parsec.And(sumNode, &prod, prodK)
// prod-> value (mulop value)*
prod = parsec.And(prodNode, &value, valueK)
// value -> num | "(" expr ")"
value = parsec.OrdChoice(exprValueNode, intWS(), groupExpr)
// expr -> sum
Y = parsec.OrdChoice(one2one, sum)