Q&A with Terence Parr on ANTLR

Learn about the ANTLR parser from the creator himself.

Matthew Casperson

Jan. 11, 16 · Interview

Likes (6)

Comment

Save

11.8K Views

What was the inspiration for the development of ANTLR?

By around 1988, I had built a lot of parsers, but manually as recursive-descent parsers rather than via yacc grammars. I didn't understand those infernal yacc reduce-reduce conflicts and didn't like the black box state machine you got out of yacc anyway. I liked the ability to step through my recursive-descent parsers with a debugger. However, grammar formalisms are very useful and I used them as comments in front of my parsing functions. For example I would do something like:

/* decl : type ID ; */

void decl() { type(); match(ID); }

Both are useful. It's easier to read the grammar but of course I can still debug the actual parsing code.

Translating grammars to parsers manually gets pretty tedious and is error-prone. In the fall of 1988, I took a course by Hank Dietz, then at Purdue University, where he taught us to build lex and yacc-like tools. That was ANTLR's (ANother Tool for Language Recognition) genesis. It became my Masters and PhD work, first an LL(1) then predicated LL(k) parser generator, always generating code similar to what I would build by hand.

What are some of the biggest challenges creating a tool like ANTLR?

Well, first, let me say that people thought I was crazy trying to take on the entrenched yacc/bison tools. That really wasn't my goal though. I just wanted a tool for my own use. Then I found out that there were lots of people like me that didn't understand LALR(1)/yacc and wanted to be able to single step through their parser for debugging purposes. As people started using ANTLR, I put more and more time into it.

Ok, so the biggest challenge creating a tool like ANTLR is making it sufficiently powerful but also accessible to the programmers in the target market. It doesn't do any good to produce an amazing tool that's hard to use, hard to understand, or doesn't interface with their favorite programming language.

The next biggest challenge is probably staying true to the winning formula and keeping conceptual integrity. E.g., UNIX's conceptual integrity is that everything is a stream. Programs talk to everything, the keyboard, disk, network, even other processes, via streams. ANTLR's secret sauce is that it stays within programmers comfort zone, translating rules like

decl : type ID ;

to parser code that is more or less human readable:

void decl() { type(); match(ID); }

The trick has been cranking up the power over the last 25 years without deviating from the secret sauce. I carefully evaluate all grammar metalanguage changes, features, and functionality to make sure it all fits together as a cohesive whole.

What has your experience been like with an open source project like ANTLR?

Well, the mechanics have changed a lot over the last 25 years, particularly with the advent of collaborative tools like github.com. In the early 90s, people had to mail a patch and ask me to evaluate it. It was hard to find software and collaborators before the web got really going around '95.

Naturally, authoring a successful tool is very rewarding. On the other hand, managing such a project can be very frustrating and tedious, not to mention a huge amount of work. When everyone else goes on vacation, you still get bug reports from important "customers." You might be surprised by how much "hate mail" you can get when you don't fix a bug or accept a patch.

Managing the personalities that all have to work together presents its own challenges. People with very potent and specific skill sets often don't mesh due to conflicting goals. It's been my job to navigate this minefield sometimes and make decisions that are going to annoy many contributors. Running the ANTLR project means rejecting code or ideas that don't fit in my worldview. People hate rejection. Some people get over it and some people don't. People tend to drift in and out as their own needs align with the ANTLR project's interests.

In the final analysis, I'm a benevolent dictator; benevolent, but a dictator nonetheless.

Who are the primary contributors to ANTLR?

I've learned a lot and accepted a lot of source code from countless programmers around the world over the last two and half decades. But, for the current ANTLR 4 software, Sam Harwell is my co-author and he has also made critical contributions to the Adaptive LL(*), or ALL(*), parsing algorithm. For academic papers on ANTLR, I've worked with Russell Quong and, most recently, with Kathleen Fisher at Tufts University. (Sam was also a co-author on the recent ALL(*) paper). Another programmer, Eric Vergnaud, has contributed significant effort to ANTLR 4 by creating a general test rig and also building the Python 2|3 and JavaScript targets. For ANTLR 3 credits, readers can check out:

http://www.antlr3.org/credits.html

Aside from the IDE plugins, Jean Bovet built the awesome ANTLRWorks for ANTLR 3 and we wrote a paper together on it. There is even this crazy guy, Tom Everett, who has taken it upon himself to be the curator of the ANTLR grammar repository:

https://github.com/antlr/grammars-v4

I apologize in advance for not mentioning every contributor, large and small, by name. Such a list would fill this page no doubt.

And, finally, I'd like to mention how much I've gained from competitive technologies and tools. For example, Bryan Ford's Parser Expression Grammars (PEGs) and Sriram Sankar's JavaCC parser generator. I consider both of them friends and their work has influenced the ANTLR project in a very positive way.

Where do you see the ANTLR project going over the next few years?

Believe it or not, I finally have the parser generator I wanted in the late 80s. The ALL(*) parsing strategy more or less solves the parsing problem in that it handles just about any grammar and can be made extremely efficient. That is just my opinion, of course, but it means I'm likely done with parsing theory work. I plan to continue fixing bugs and possibly adding features as I need them. For example, I really need to add a parse tree factory mechanism so that ANTLR will build parse trees with desired node types. I also really enjoy building plug-ins for Intellij and related IDEs, so I'll probably keep doing that kind of stuff.

I plan to make a major update to the Language Implementation Patterns book to include lots of new examples, such as virtual machines written in C, malloc() implementations, and automatic garbage collection implementations.

What are some of your favorite projects, books, tutorials or blog posts on ANTLR?

The following two books are really the best place for people to start. They are cheap and full of good information:

Language Implementation Patterns. http://amzn.com/193435645X

The Definitive ANTLR 4 Reference. http://amzn.com/1934356999

Those interested in the parsing engine of the ANTLR 4 runtime can wade through the following dense academic paper.

Adaptive LL(*) Parsing: The Power of Dynamic Analysis

http://www.antlr.org/papers/allstar-techreport.pdf

I've moved the ANTLR documentation of the tool itself to the repository and would welcome updates via pull requests:

https://github.com/antlr/antlr4/blob/master/doc/index.md

Readers might also be interested in my StringTemplate, a java template engine (with ports for C#, Objective-C, JavaScript) for generating source code, web pages, emails, or any other formatted text output. ANTLR itself is written using StringTemplate to generate parsers in multiple languages:

http://www.stringtemplate.org/

What advice do you have for other open-source projects or those looking for a project?

Look for the pain. A lot of people ask me how to come up with a project and the best advice is to identify pain points when writing software. Chances are others will have the same problem. If you can come up with a library or a tool to reduce pain, you have yourself a project.

Start small. Don't try to solve a large problem in one giant release. It's better to push out usable little pieces, chipping away at the larger problem. Others will find the pieces useful and it's a good way to look for collaborators.

Be courageous. Be confident that you can eventually produce something of quality and don't worry if your initial versions are overly simplistic, incomplete, or kind of crappy. Do your best to produce good software, but don't worry if it's not perfect. Even an imperfect solution to a problem is often good enough. Or, at least, it can help others go in the right direction. It's usually better to just get something going rather than flounder about trying to figure out the perfect solution before commencing or publishing. Make forward progress.

Don't try to please everyone. It's better to have a really focused product rather than one that tries to satisfy everyone. The focused product doesn't have to make compromises on the task or problem it solves to satisfy ancillary features. By analogy, build a flavorful dark German rye not bland US white bread. If people don't like your Rye bread, no problem. They can go find a tasty sourdough loaf or make their own.

Don't be afraid to start over. If abandoning a repository and starting fresh produces a much better product, do it. You might not get "customers" to jump ship for the new version but they will if it's good enough. At the least new projects can use your new version. If the versions are outwardly compatible, everybody wins. Note that ANTLR 4 is the 4th complete rewrite of the software with major breaking changes between versions but there is wide adoption of v4 because it is a significant improvement over v3.

Be a benevolent dictator with a clear voice. Leadership and "product" management really need to have a single voice, even if there is a committee that makes recommendations.

ANTLR

Opinions expressed by DZone contributors are their own.

Trending

Q&A with Terence Parr on ANTLR

Learn about the ANTLR parser from the creator himself.

Partner Resources