DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • Migrating MuleSoft System API to AWS Lambda (Part 1)
  • JSON-Based Serialized LOB Pattern

Trending

  • Microsoft Azure Synapse Analytics: Scaling Hurdles and Limitations
  • AI, ML, and Data Science: Shaping the Future of Automation
  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  1. DZone
  2. Data Engineering
  3. Databases
  4. Custom Grammar to Query JSON With Antlr

Custom Grammar to Query JSON With Antlr

Want to learn how to create a custom grammar to query JSON? Check out this tutorial to learn how write queries with Antlr!

By 
Uday Chandra user avatar
Uday Chandra
·
Sep. 18, 18 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
12.3K Views

Join the DZone community and get the full member experience.

Join For Free

Antlr is a powerful tool that can be used to create formal languages. Vital to the formalization of a language are symbols and rules, also known as grammar. Defining custom grammar and generating the associated parsers and lexers is a straightforward process with Antlr. Antlr’s runtime enables tokenization of a given character stream and parsing of those tokens. It provides mechanisms to walk through the generated parse tree and apply custom logic. Let’s take this tool for a spin and create a custom grammar to query JSON. Our end goal is to be able to write queries like the one shown below:

bpi.current.code eq "USD" and bpi.current.rate gt 650.60


To create a new grammar, one has to define the rules of the grammar. Let’s do that by creating a file named “JsonQuery.g4." We can then start composing the grammar rules that will allow us to query JSON. Here’s the snippet:

grammar JsonQuery;

query
   : SP? '(' query ')'                              #parenExp
   | query SP LOGICAL_OPERATOR SP query             #logicalExp
   | attrPath SP 'pr'                               #presentExp
   | attrPath SP op=( 'eq' | 'ne' ) SP value        #compareExp

   ;

LOGICAL_OPERATOR
   : 'and' | 'or'
   ;

EQ : 'eq' ;
NE : 'ne' ;

attrPath
   : ATTRNAME subAttr?
   ;

subAttr
   : '.' attrPath
   ;

ATTRNAME
   : ALPHA ATTR_NAME_CHAR* ;

fragment ATTR_NAME_CHAR
   : '-' | '_' | ':' | DIGIT | ALPHA
   ;


You can browse the complete set of rules here.

Antlr mandates that we follow certain conventions while creating grammars. For starters, the file should contain a header, and the header name should match the filename holding the grammar. Antlr recognizes two types of rules — parser rules and lexer rules. Parser rules have to start with a lowercase letter, and the lexer rules have to start with an uppercase letter. In the snippet above, “query” is a parser rule and “EQ” is a lexer rule. Rule alternatives, like the ones defined for the “query” parser rule, can be labeled by using the “#” operator (eg: “#parenExp”). Labeling alternatives will trigger more precise events while we walk the parse tree. As I mentioned before, Antlr is extremely versatile and provides a plethora of features from defining rules, generating parsers, lexers, listeners, and visitors to non-greedy sub-rules and ways to handle precedence and left-recursion.

Antlr also provides IDE plugins that can be used to create and visualize a grammar. We can quickly test sample expressions against our grammar and preview the generated parse tree. Here’s a view of the generated parse tree based on the JSON query expression that we wrote earlier:

Now that we have a working grammar for querying JSON, let’s turn our attention to creating a Java program and implementing a query engine. The engine will walk the generated parse tree based on a given query expression, evaluate it against the specified JSON object, and return a boolean value to indicate if the query is a match or not. Let’s use Gradle to create our project. Here’s the relevant Gradle build file to enable the Antlr plugin and its dependencies:

plugins {
    id "antlr"
}

dependencies {
    antlr "org.antlr:antlr4:4.7"
}

generateGrammarSource {
    arguments += ["-visitor"]
}


Note that Antlr can be configured to generate a listener class or a visitor class — two parse tree walking mechanisms. We will use the visitor mechanism to walk through the parse tree and evaluate the query expression. Antlr’s Gradle plugin will generate the source code that defines the lexer, parser, and visitor classes based on our grammar. We can simply extend the generated abstract classes and implement the relevant custom logic to evaluate a JSON query expression. Here’s a snippet from the JsonQueryEvaluator class:

public class JsonQueryEvaluator
        extends JsonQueryBaseVisitor<Boolean> {

    @Override
    public Boolean visitParenExp(ParenExpContext ctx) {
        Boolean result = visit(ctx.filter());
        return ctx.NOT() != null ? !result : result;
    }

    @Override
    public Boolean visitLogicalExp(LogicalExpContext ctx) {
        Boolean leftExp = visit(ctx.filter(0));

        if (OR.equals(ctx.LOGICAL_OPERATOR().getText())) {
            // Short circuit "or"
            return leftExp;

        } else {
            return leftExp && visit(ctx.filter(1));
        }
    }
    ...
}


Notice how the visitor method names were generated based on the labels that we specified in our grammar. This gives us the ability to evaluate the various alternatives of a parser rule against a given JSON object. Had we not used labels, we would have been forced to use numerous if-else or switch statements to implement the same functionality.

Now that we have a custom evaluator, let’s create the query engine class. Its job is to stream an expression to the lexer, tokenize that stream, generate the corresponding parse tree, and then walk the parse tree to evaluate the expression against a JSON object. Here’s a snippet from the JsonQueryEngine class:

public class JsonQueryEngine {

    public boolean execute(String expression, JsonObject item) {
        if (StringUtils.isNotBlank(expression)) {

            CharStream stream = CharStreams
                    .fromString(expression.trim());

            QueryLexer lexer = new QueryLexer(stream);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            QueryParser parser = new QueryParser(tokens);

            ParseTree parseTree = parser.query();
            JsonQueryEvaluator evaluator =
                    new JsonQueryEvaluator(item);

            return evaluator.visit(parseTree)

        } else {
            ...
        }
    }
    ...
}


That’s it, folks. We now have a custom grammar that can be used to, for example, assert conditions within a JSON object while writing tests. Of course, there’s room for improvement in terms of optimizing the grammar and the parsing logic. Head over to GitHub to grab the source code and experiment with it.

Happy coding!

JSON Database

Published at DZone with permission of Uday Chandra, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Keep Calm and Column Wise
  • Accelerating Insights With Couchbase Columnar
  • Migrating MuleSoft System API to AWS Lambda (Part 1)
  • JSON-Based Serialized LOB Pattern

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: