Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Writing a Compiler in C#: Parsing, Part 2

DZone's Guide to

Writing a Compiler in C#: Parsing, Part 2

·
Free Resource

Before we proceed to the full BNF of a Jack expression, we need to decide which operators we’re going to support. Our final implementation will have some additional operators, but for now we’ll settle for +, –, *, /, <, >, =, &, |, and !.

One obvious question when dealing with arithmetic and relational operators is the question of operator precedence.

What’s the value of 5+3*2? Is it 16 or 11?

What’s the value of 3<2+5? Is it 6, or 1, or something else entirely depending on the integer coercion of a Boolean value?

You might be tempted to think that operator precedence is the parser’s problem, i.e. we must install special parsing code to handle multiplications before additions. Indeed, this problem can be solved by using two stacks for operators and operands (e.g. see here).

However, we can solve the problem of operator precedence at the BNF level, from which we create the parsing procedures. This is one of the neatest parsing tricks. The following BNF provides proper operator precedence for arithmetic operators:

expression ::= factor ( ('+' | '-') factor )*
factor     ::= term ( ('*' | '/') term )*
term       ::= digit+

From this thing the parser code becomes immediately self-evident:

procedure ParseExpression
    ParseFactor()
    while LookaheadToken <> ";"
        op = NextToken()
        assert op in [ "+", "-" ]
        ParseFactor()
        print op

    end while
end procedure

procedure ParseFactor
    ParseTerm()
    while LookaheadToken <> ";"
        op = NextToken()
        assert op in [ "*", "/" ]
        ParseTerm()
        print op

    end while

end procedure

procedure ParseTerm
    term := NextToken()
    print term
end procedure

What if we wanted to add support for parentheses to modify explicitly the operator precedence in an expression? Again, before being tempted to add special cases to the parser, consider the following grammar modification:

term ::= digit+ | '(' expression ')'

The modification to the parser is also immediate:

procedure ParseTerm
    if LookAheadToken = "("
        ParseBLOCKED EXPRESSION
        Match(")")

    else
        term := NextToken()
        print term
    end if
end procedure

Incredible! Now we’re ready to appreciate the full BNF of a Jack expression (with added operator precedence compared to the TECS version):

expression ::= relexpr ( ('&' | '|') relexpr )*
relexpr    ::= addexpr ( ('<' | '>' | | '=') addexpr )*
addexpr    ::= mulexpr ( ('+' | '-') mulexpr )*
mulexpr    ::= term    ( ('*' | '/') term )*
term       ::= int-const | str-const | keyword-const |
               var-name  | var-name '[' expression ']' |
               sub-call  | '(' expression ')' |
               ('-' | '!') term
sub-call   ::= sub-name '(' expr-list ')' |
               (class-name | var-name) '.' sub-name
                  
'(' expr-list ')'
expr-list  ::= ( expression ( ',' expression )* )?

Whew! This thing recognizes four levels of operator precedence through the use of three intermediate BNF definitions—relexpr, addexpr, and mulexpr. Note that the unary operators have the highest precedence, which is guaranteed by their placement at the bottom-most term level.

As before, the parser code flows naturally and beautifully from this BNF. This time, instead of using “print term” as the placeholder for actual operations, we call the code generator whenever we have some interesting information about the parsed program. (Another alternative is to construct an intermediate representation such as a parse tree, and have the code generator analyze it in one shot.)

private void ParseBLOCKED EXPRESSION
{

ParseRelationalBLOCKED EXPRESSION;
while (IsNextTokenLogicalOp())
{
Token logicalOp = NextToken();
ParseRelationalBLOCKED EXPRESSION;
switch (logicalOp.Value)
{
case "&":
_codeGenerator.And();
break;
case "|":
_codeGenerator.Or();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseRelationalBLOCKED EXPRESSION
{
ParseAddBLOCKED EXPRESSION;
while (IsNextTokenRelationalOp())
{
Token relationalOp = NextToken();
ParseAddBLOCKED EXPRESSION;
switch (relationalOp.Value)
{
case "<":
_codeGenerator.Less();
break;
case ">":
_codeGenerator.Greater();
break;
case "=":
_codeGenerator.Equal();
break;
case "<=":
_codeGenerator.LessOrEqual();
break;
case ">=":
_codeGenerator.GreaterOrEqual();
break;
case "!=":
_codeGenerator.NotEqual();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseAddBLOCKED EXPRESSION
{
ParseMulBLOCKED EXPRESSION;
while (IsNextTokenAddOp())
{
Token addOp = NextToken();
ParseMulBLOCKED EXPRESSION;
switch (addOp.Value)
{
case "+":
_codeGenerator.Add();
break;
case "-":
_codeGenerator.Sub();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseMulBLOCKED EXPRESSION
{
ParseTerm();
while (IsNextTokenMulOp())
{
Token mulOp = NextToken();
ParseTerm();
switch (mulOp.Value)
{
case "*":
_codeGenerator.Mul();
break;
case "/":
_codeGenerator.Div();
break;
case "%":
_codeGenerator.Mod();
break;
default:
ThrowCompilationException(...);
break;
}
}
}
private void ParseTerm()
{
if (LookAheadToken.Type == TokenType.IntConst)
{
Token intConst = NextToken();
_codeGenerator.IntConst(int.Parse(intConst.Value));
}
else if (LookAheadToken.Type == TokenType.StrConst)
{
Token strConst = NextToken();
_codeGenerator.StrConst(strConst.Value);
}
else if (IsNextTokenKeywordConst())
{
Token keywordConst = NextToken();
switch (keywordConst.Value)
{
case "true":
_codeGenerator.True();
break;
case "false":
_codeGenerator.False();
break;
case "null":
_codeGenerator.Null();
break;
case "this":
...
_codeGenerator.This();
break;
default:
ThrowCompilationException(...);
break;
}
}
else if (LookAheadToken.Type == TokenType.Ident)
{
//This is either a variable name, an array access,
//or a subroutine call. We need to look ahead:
Token varName = NextToken();
if (LookAheadToken.Value == ".")
{
ParseSubCall(varName);
}
else
{

//We’ll discuss symbol tables later. For now,
//this thing only helps us to figure out if the
//variable has been defined previously.
Symbol symbol = GetClosestSymbol(varName.Value);
if (symbol == null)
{
ThrowCompilationException(
"Undefined identifier '" +
varName.Value + "'");
}

bool arrAccess = false;
if (LookAheadToken.Value == "[")
{
arrAccess = true;

Match(new Token(TokenType.Symbol, "["));
ParseBLOCKED EXPRESSION;
Match(new Token(TokenType.Symbol, "]"));
}
_codeGenerator.VariableRead(varName, arrAccess);
}
}
else if (LookAheadToken.Value == "(")
{
Match(new Token(TokenType.Symbol, "("));
ParseBLOCKED EXPRESSION;
Match(new Token(TokenType.Symbol, ")"));
}
else if (IsNextTokenUnaryOp())
{
Token unaryOp = NextToken();
ParseTerm();
switch (unaryOp.Value)
{
case "-":
_codeGenerator.Negate();
break;
case "!":
_codeGenerator.Not();
break;
default:
ThrowCompilationException(...);
break;
}
}
else
{
ThrowCompilationException(...);
}
}

I intentionally omitted the parsing procedure for sub-call because it poses an interesting challenge, up next.

Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}