DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Using the Mozilla SQL Parser — Part 2
  • XML Optimization for the Highest Performance
  • A New Era Has Come, and So Must Your Database Observability
  • Conversational Applications With Large Language Models Understanding the Sequence of User Inputs, Prompts, and Responses

Trending

  • Automated Testing Lifecycle
  • Time Series Analysis: VAR-Model-As-A-Service Using Flask and MinIO
  • Top 7 Best Practices DevSecOps Team Must Implement in the CI/CD Process
  • Navigating the Skies
  1. DZone
  2. Data Engineering
  3. Databases
  4. I Built a Query Parser Over a Weekend (Part 1)

I Built a Query Parser Over a Weekend (Part 1)

The parser I built wasn’t for JSON, one of the simplest formats. I wanted to play with a query language. I wanted something SQL-like. And that's not easy to do.

Oren Eini user avatar by
Oren Eini
·
Jul. 25, 17 · Opinion
Like (4)
Save
Tweet
Share
5.99K Views

Join the DZone community and get the full member experience.

Join For Free

Some tasks are fun — they are self-contained, easy to conceptualize (though not always easy to build), and challenging. A few weeks ago, I spent my weekend writing a parser, and it was a lot of fun.

I’ve been writing parsers for a very long time. My book about them came out in 2010 and I've been playing with Boo since 2005. The ANTLR book was an interesting read and it taught me a lot about how to approach text parsing.

However, you may have noticed that I've shifted my thinking about a lot of design problems — in particular, performance, the number of allocations, the exceptions thrown during parsing, the readability of errors when getting invalid input, etc.

In particular, the Lucene query parser is a really good example of a really crappy one. It fails on pretty much all of the above points. I worked with a bunch of parser generators and I never found one whose output was something that I really liked. They typically generate unreadable code, and the customization of their behavior is not obvious, to say the least.

As Martin Fowler has said (only slightly out of context):

…it's hard to write a parser.

My most recent parser is the RavenDB JSON parser. You can see the progression of the ideas around that in this series of posts. (That isn’t something that you’ll really want to read without a cup of coffee and some writing instruments to write notes.)

Most non-trivial parsers are composed of at least two pieces: a tokenizer and an builder. The tokenizer goes over the input and breaks it into tokens that the builder uses to build the final format. The JSON scanner in RavenDB is called UnmanagedJsonParser and the builder is BlittableJsonDocumentBuilder. Traditionally, they would be called scanner and parser, for the roles they play, but we’ll leave the names as is because it doesn’t really matter. This code is not fun to go through. It has been through multiple performance reviews, each time making it uglier then before, but much faster.

JSON is also one of the simplest possible textual formats. The formal definition of JSON fits a post-it note. The JSON scanner I have for RavenDB is close to 900 lines of code and is only part of the parsing process.

But the parser I built over the weekend wasn’t for JSON. Instead, I wanted to play with a query language, so naturally, I wanted something SQL-like. And that is anything but trivial to do.

The first thing I needed was to actually sit down and figure out what the language would look like. In order to do that, you almost always want to use a BNF notation of some kind. This allows you to specify what your language should look like — not just as a few snippets in a notepad window but in a more structured manner.

More to the point, there are a lot of tools out there to use. I decided to use GOLD Parser. It was last updated in in 2012 (and it shows) but it has the lowest friction of all the parser IDEs that I tried and it has great support for debugging and working with grammars. Why not use ANTLR, which is pretty much the default choice? Put simply, it is usually too much of a hassle to setup ANTLR properly and I didn’t want to get bogged down with the actual details of generating the parsers. I wanted to focus on the grammar.

I actually don’t know how to parse text using the GOLD Parser. It looks like it generates a binary file that you feed to some library that would do it for you, but I’m not sure and it doesn’t matter. What I care about is that I can develop a formal definition and debug it easily. I’m not actually going to use it to generate the parser.

Huh? Why do all this work for no reason?

A formal definition of the language is incredibly helpful when you consider a new syntax because you can verify that you aren’t creating holes and ambiguities in your language. It also gives you pretty clear guidelines on how to implement the language.

I’m going to go into more details about the language itself and building a parser for it in my next post.

Parser (programming language) Database

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Using the Mozilla SQL Parser — Part 2
  • XML Optimization for the Highest Performance
  • A New Era Has Come, and So Must Your Database Observability
  • Conversational Applications With Large Language Models Understanding the Sequence of User Inputs, Prompts, and Responses

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: