DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Advancing Robot Vision and Control
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • How to Perform Custom Error Handling With ANTLR

Trending

  • Infrastructure as Code (IaC) Beyond the Basics
  • Mastering Advanced Traffic Management in Multi-Cloud Kubernetes: Scaling With Multiple Istio Ingress Gateways
  • How Large Tech Companies Architect Resilient Systems for Millions of Users
  • Designing for Sustainability: The Rise of Green Software
  1. DZone
  2. Coding
  3. Languages
  4. ANTLR 4 with Python 2 Detailed Example

ANTLR 4 with Python 2 Detailed Example

ANTLR 4 introduced a handy listener-based API, but sometimes it's better not to use it.

By 
Alan Hohn user avatar
Alan Hohn
·
Apr. 04, 16 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
24.9K Views

Join the DZone community and get the full member experience.

Join For Free

In a previous post I showed a very simple example using ANTLR 4 with Python 2. While this example gave the basic framework necessary, it didn't delve very deeply into ANTLR's API. In this article, I'll give a little more detail.

Grammar

To get started, we need grammar that is more complex than the basic "Hello, world!" grammar. There are a large number of examples for ANTLR 4 grammar on GitHub. I started with arithmetic grammar and simplified it (removing exponents and scientific notation).

The most interesting part of the grammar is this:

expression
   : multiplyingExpression ((PLUS | MINUS) multiplyingExpression)*
   ;

multiplyingExpression
   : number ((TIMES | DIV) number)*
   ;

number
   : MINUS? DIGIT + 
   ;

Our top-level construct is an expression. It consists of multiplying expressions separated by either a plus or a minus. Similarly, a multiplying expression consists of numbers separated by a times or division sign.

There are a couple smart points in the way this grammar is assembled. First, note that a number by itself is a valid multiplying expression, and therefore a single number is a valid expression. Second, by breaking the plus/minus and times/div into separate tokens, it makes it much easier to handle order of operations (since they will be in different levels of the tree).

Code Generation

As before, we run ANTLR on the grammar to generate code. In the previous example I showed using the downloaded JAR directly. If using a package manager where antlr4ends up on the path, the command is:

$ antlr4 -Dlanguage=Python2 arithmetic.g4

This generates a lexer, parser, and a base class for a listener.

Tree Walking

I'll give the main body of the code first. It starts in a similar way to the previous example:

def main():
    lexer = arithmeticLexer(antlr4.StdinStream())
    stream = antlr4.CommonTokenStream(lexer)
    parser = arithmeticParser(stream)
    tree = parser.expression()
    handleExpression(tree)
if __name__ == '__main__':
    main()

We start by reading from standard in and passing that through the lexer, then the parser. This builds a tree of the parsed input.

At this point, we head in a different direction from the previous example. Rather than creating a walker to walk the tree, providing a listener class, we instead just pass the tree to a method that walks the tree directly.

Listener Class Challenges

The purpose of a listener class is to turn the tree into a series of stream-like events to make processing easier. However, in this case we run into a problem. We have listener methods available for expressions, multiplying expressions, and numbers, because these are defined in terms of other tokens. But we don't have a method available for "terminal" nodes such as our operators ("+", "-", "*", "/"). We only find out about these by inspecting the context of parent objects (expression and multiplying expression).

Also, because we are using infix notation, at the time we see the operator, we don't have all of the information that we're going to need to use that operator. In fact, because expressions can be arbitrarily long, we couldn't look ahead to get the information we need, even if there was a convenient way to do that with ANTLR.

Fortunately, the ANTLR API provides us with the means to iterate over the children of a node. So we don't have to wait for stream events to fire; instead, we can walk through the children in order.

Here's the resulting implementation. I'm not yet convinced that this is the best solution, but it does work and it is reasonably compact.

First, we need an entry method that handles the top-level expression:

def handleExpression(expr):
    adding = True
    value = 0
    for child in expr.getChildren():
        if isinstance(child, antlr4.tree.Tree.TerminalNode):
            adding = child.getText() == "+"
        else:
            multValue = handleMultiply(child)
            if adding:
                value += multValue
            else:
                value -= multValue

    print "Parsed expression %s has value %s" % (expr.getText(), value)

We iterate over the children; where we find a multiplying expression, we evaluate it. Where we find an operator, we use it to set a flag indicating the next operation to perform.

Multiplying expressions are handled in a similar way:

def handleMultiply(expr):
    multiplying = True
    value = 1
    for child in expr.getChildren():
        if isinstance(child, antlr4.tree.Tree.TerminalNode):
            multiplying = child.getText() == "*"
        else:
            if multiplying:
                value *= int(child.getText())
            else:
                value /= int(child.getText())

    return value

Note that we set the initial value to 1, so if we see an expression with a single number we will handle it correctly. Also, to handle numbers we just get the text form, which contains all the digits, and turn it into an integer. (The ANTLR tree contains each digit individually but this is easier.)

Of course it feels a little clunky to use isinstance in this implementation, and a little clunky to store state related to the "last seen" operator, but the alternatives seemed even clunkier. For example, the expression contains a PLUS() method, but this method is just a list of all of the "+" symbols in the expression.

One nice thing about this implementation is that it's safer than it might initially look. ANTLR catches errors in parsing the input string, so any characters that don't fit our grammar result in an error before our parsing code is invoked. (The example code doesn't yet handle those errors, but at least they are caught.) This means that it's safe to assume that any number we find can be converted to an integer, and that if there is an operator in a multiplying expression, it's either "*" or "/" because those are the only ones that are valid.

Results

With this implementation in place, we have a somewhat functional calculator similar to the UNIX command bc:

$ echo "3 * 3 - 2 + 2 * 2" | python arithmetic.py 
Parsed expression 3*3-2+2*2 has value 11

Conclusion

Even though we're still in the realm of a simple example with ANTLR, we've already gotten into some deeper waters with how we had to handle the parsed input. If we were to add back in some of the extra operators, notations, and parenthetical expressions, we might have to consider refactoring to avoid additional duplication of code.

ANTLR Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Advancing Robot Vision and Control
  • Enhancing Business Decision-Making Through Advanced Data Visualization Techniques
  • Build a Simple REST API Using Python Flask and SQLite (With Tests)
  • How to Perform Custom Error Handling With ANTLR

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!