The focus of the second part will be expression parsing. The desired grammar for the expression small language experiment is the following (here in pseudo BNF):
NESTED_EXPRESSION = '(' EXPRESSION ')'
PRIMARY_EXPRESSION = SYMBOL  NUMBER  STRING  NESTED
UNARY_EXPRESSION = NOT_EXPRESSION  CALL_EXPRESSION  ARRAY_ACCESS_EXPRESSION 
PRIMARY_EXPRESSION
MULTIPLICATIVE_EXPRESSION = UNARY_EXPRESSION ( ('*'  '/') UNARY_EXPRESSION)*
ADDITIVE_EXPRESSION = MULTIPLICATIVE_EXPRESSION ( ('*'  '/') MULTIPLICATIVE_EXPRESSION)*
RELATIONAL_EXPRESSION = ADDITIVE_EXPRESSION ( ('<'  '>') ADDITIVE_EXPRESSION)*
EQUALITY_EXPRESSION = RELATIONAL_EXPRESSION ( ('='  '<>') RELATIONAL_EXPRESSION)*
LOGICAL_OR_EXPRESSION = EQUALITY_EXPRESSION ('' EQUALITY_EXPRESSION)*
LOGICAL_AND_EXPRESSION = LOGICAL_OR_EXPRESSION ('&&' LOGICAL_OR_EXPRESSION)*
EXPRESSION = LOGICAL_AND_EXPRESSION
In this post we're going to build some key techniques for creating the code for parsing a subset of this grammar. Not every element of expression grammar will be presented since it gets repetitive at some point.
For this small experiment we're going to use the following F# types as the AST of the program:
type Operator =
 Plus
 Minus
 Times
 Div
 And
 Or
 Equal
 NotEqual
 Assign
 Lt
 Gt
type PExpr =
 PSymbol of string
 PString of string
 PNumber of string
 PBoolean of bool
 PNot of PExpr
 PCall of string * (PExpr list)
 PNested of PExpr
 PArrayAccess of PExpr * PExpr
 PBinaryOperation of Operator * PExpr * PExpr
Simple atomic expressions
We're going to start with the atomic expressions:
 Symbols or variables (ex.
x
,y
,z
)  Number literals (ex
1
,1.2
)  String literals (ex.
"Hello"
)
We already got these parsers from the previous post:
let pSymbol =
concatParsers2
(readWithConditionOnChar (fun c > System.Char.IsLetter(c, 0)))
(fun initialChar >
concatParsers2
(readZeroOrMoreChars (fun c > System.Char.IsLetter(c)  System.Char.IsDigit(c)))
(fun suffixString > (preturn (PSymbol (initialChar + suffixString))))
)
let pNumber =
( (optionalP (readSpecificChar '') "") >>= (fun neg >
digitP >>= (fun firstChar >
(readZeroOrMoreChars (fun c > System.Char.IsDigit(c))) >>= (fun chars >
(optionalP decimalPartP "") >>= (fun dec >
preturn (PNumber (neg + firstChar + chars + dec)))))))
let pString =
whitespace >>
readSpecificChar '"' >>
readZeroOrMoreStringChars (fun previous current >
(previous = '\\' && current = '"')  current <> '"')
>>= (fun stringContents >
readSpecificChar '"' >> (preturn (PString stringContents)))
We can call these expressions "primary expressions", which are used in conjunction with operators to create more complex elements. We need a parser that recognizes any one these elements. We can use the disjParser
from the previous post :
> let primaryExpression = (disjParser (disjParser pSymbol pNumber) pString);;
val primaryExpression : (ReaderState > (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})
> parse "\"hello\"" myPrimaryExpression;;
val it : (PExpr * ReaderState) option =
Some (PString "hello", {Data = ""hello"";
Position = 7;
Indentation = [0];})
We can use List.reduce
function to improve this code since we could have several parsers as the primary expression.
> let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString] ;;
val myPrimaryExpression : (ReaderState > (PExpr * ReaderState) option)
> parse "10" myPrimaryExpression
 ;;
val it : (PExpr * ReaderState) option =
Some (PNumber "10", {Data = "10";
Position = 2;
Indentation = [0];})
> parse "x1" myPrimaryExpression
 ;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x1", {Data = "x1";
Position = 2;
Indentation = [0];})
Binary operations
Binary expressions are composed of two expressions and an operator which connects them. For example: a + b
. For these operations we can define the following utility parser creation function:
let buildExpressions (leftExpr:PExpr) (rightExprs:(Operator * PExpr) list) =
(List.fold (fun left (op,right) > PBinaryOperation(op, left, right)) leftExpr rightExprs)
let pBinaryExpression operators lowerLevelElementParser =
lowerLevelElementParser
>>= (fun leftTerm >
(zeroOrMore
(operators
>>= (fun op >
lowerLevelElementParser >>= (fun expr > preturn (op, expr))))
[])
>>= (fun acc > preturn (buildExpressions leftTerm acc) ))
This function allows us to build parsers for binary expressions . The same expressed in pseudo BNF may look like this:
pBinaryExpression = lowerLevelElementParser ( operators lowerLevelElementParser )*
Which means that we'll like to recognize an element with lowerLevelElementParser
and zero or more pairs of an element recognized by operators
followed by another lowerLevelElementParser
. What looks strange at first is the use of zero or more
which implies that this parser can recognize just one element with lowerLevelElementParser
. This will be useful in the following section when we try to implement operator precedence.
As an example we can define a parser for plus
as follows:
> let identifyOperator operatorChar operatorResult =
 concatParsers
 whitespaceNoNl
 ((readSpecificChar operatorChar) >> (preturn operatorResult))
 ;;
val identifyOperator :
operatorChar:char >
operatorResult:'a > (ReaderState > ('a * ReaderState) option)
> let plusOperator = identifyOperator '+' Plus;;
val plusOperator : (ReaderState > (Operator * ReaderState) option)
> let myPlus = pBinaryExpression plusOperator myPrimaryExpression;;
val myPlus : (ReaderState > (PExpr * ReaderState) option)
> parse "1+2" myPlus;;
val it : (PExpr * ReaderState) option =
Some (PBinaryOperation (Plus,PNumber "1",PNumber "2"), {Data = "1+2";
Position = 3;
Indentation = [0];})
Arithmetic operations
Now we're going to add support for basic arithmetic operations: +
, 
, /
, *
. By using the pBinaryExpression
utility function defined in the previous section we can preserve operator precedence. To do this we define the operations as follows:
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
What we say here is that an additive expression
is composed of addition or subtraction (disjParser plusOperator minusOperator)
of multiplicative expressions
. Also we said that a multiplicative expression
is composed of multiplication or division (disjParser pDivOperator pTimesOperator)
of primary expressions.
An example of using these operators looks like this:
> parse "1 * 2 + 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PBinaryOperation (Times,PNumber "1",PNumber "2"),PNumber "3"),
{Data = "1 * 2 + 3";
Position = 9;
Indentation = [0];})
> parse "1 + 2 * 3" pAdditiveExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Plus,PNumber "1",PBinaryOperation (Times,PNumber "2",PNumber "3")),
{Data = "1 + 2 * 3";
Position = 9;
Indentation = [0];})
Recursion
One thing that was tricky to implement at first was recursion. At some point in the grammar we need to refer back to the top parser . For example a parentheses expression can have any other expression and its recognized as a primary expression. For example:
1 * (2 + 3)
Needs to be parsed as:
*
/ \
/ \
1 (2+3)

+
/ \
2 3
What we need to do is to define a top level pExpression
parser used for recognizing all our expressions . This parser will be used to define the parenthesis or nested expression. At this point our grammar written using pseudo BNF looks like this:
pPrimaryExpression = pNumber  pSymbol  pString
pAdditiveExpression = pPrimaryExpression ( (plusOperator  minusOperator) pPrimaryExpression )*
pMultiplicativeExpression = pAdditiveExpression ( (divOperator  timesOperator) pPrimaryExpression )*
pTopExpression = pMultiplicativeExpression
In F# this using our set of combinators it looks like:
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpression
We can use pTopExpression
to parse any expression:
> parse "3+10*x1/32" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Minus,
PBinaryOperation
(Plus,PNumber "3",PBinaryOperation (Times,PNumber "10",PSymbol "x")),
PBinaryOperation (Div,PNumber "1",PNumber "32")),
{Data = "3+10*x1/32";
Position = 11;
Indentation = [0];})
> parse "x" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some (PSymbol "x", {Data = "x";
Position = 1;
Indentation = [0];})
But now we want to use pTopExpression
to define one of our primary expressions:
//WARNING THE FOLLOWING CODE WILL NOT COMPILE
let readLPar =
concatParsers whitespace (readSpecificChar '(')
let readRPar = readSpecificChar ')'
let pNested = readLPar >>
pTopExpression >>= (fun expr >
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
let pTopExpression = pAdditiveExpression
When we try to compile this code we get the following error:
pTopExpression >>= (fun expr >
^^^^^^^^^^^^^^
/dir/stdin(6,15): error FS0039: The value or constructor 'pTopExpression' is not defined
Which is totally correct, pTopExpression
is defined after pNested
. It has to because pTopExpression is defined using pAdditiveExpression
has a dependency on pPrimaryExpression. Until now we only used function composition to create our parsers.
What we want to do is to define:
number symbol string nested
^ ^ ^ ^ 
    
    
++++ 
 
 
primary 
 
 
additive 
 
 
multiplicative 
 
 
top <+
To solve this problem we're going to take advantage of reference variables in F#. And do the following trick:
let pTopExpressions = ref []
let pTopExpression =
fun state > (List.reduce disjParser !pTopExpressions) state
let pNested = readLPar >>
pTopExpression >>= (fun expr >
readRPar >> (preturn (PNested expr)))
let myPrimaryExpression = List.reduce disjParser [pSymbol; pNumber; pString; pNested]
let pMultiplicativeExpression = pBinaryExpression (disjParser pDivOperator pTimesOperator) myPrimaryExpression
let pAdditiveExpression = pBinaryExpression (disjParser plusOperator minusOperator) pMultiplicativeExpression
pTopExpressions := [pAdditiveExpression]
Using a reference (ref
) allows us to create the recursive parser that we need . Here's an example of using it:
> parse "1*(2+3)" pTopExpression;;
val it : (PExpr * ReaderState) option =
Some
(PBinaryOperation
(Times,PNumber "1",
PNested (PBinaryOperation (Plus,PNumber "2",PNumber "3"))),
{Data = "1*(2+3)";
Position = 7;
Indentation = [0];})
Next steps
Using the same techniques presented so far we can finish our basic grammar. This is part #2 of an ongoing series of posts on building a parser for a small language. The first post can be found here: http://langexplr.blogspot.com/2017/01/asimplelanguagewithindentation.html.
No comments:
Post a Comment