Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Parsing in Python: Tools and Libraries (Part 7)

DZone's Guide to

Parsing in Python: Tools and Libraries (Part 7)

Sometimes, you *need* to build a parser, but you really don't *want* to. That's where parser combinators come into play.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Check out Part 6 here!

Parser Combinators

Parser combinators allow you to create a parser by combining different pattern-matching functions that are equivalent to grammar rules. They are generally considered best-suited for simpler parsing needs.

In practice, this means that they are very useful for all the little parsing problems you find. If the typical developer encounters a problem that is too complex for a simple regular expression, these libraries are usually the solution. In short, if you need to build a parser but you don’t actually want to, a parser combinator may be your best option.

Parsec.py, Parsy, and Pyparsing

A universal Python parser combinator library inspired by Parsec library of Haskell.

That is basically the extent of the documentation on Parsec.py, though there are a couple of examples. If you already know how to use the original Parsec library or one of its many clones, you can try to use it. It doesn't look bad, but the lack of documentation is a problem for new users.

Parsy is an easy way to combine simple, small parsers into complex, larger parsers. If it means anything to you, it’s a monadic parser combinator library for LL(infinity) grammars in the spirit of Parsec, Parsnip, and Parsimmon.

Parsy was an abandoned project for a while, but it was recently recovered and taken up by a new maintainer, and it is now in a good shape. Among other things, the new developer brought the project to recent coding practices (i.e. testing coverage).

The project might not be as powerful as an “industrial-strength” parser combinator such Parsec (the original one), but it has a few nice features. For instance, you can create a generator function to create a parser. It now requires Python 3.3 or later, which should only be a problem for people stuck with Python 2.

The project now has ample documentation, examples, and a tutorial. The following example comes from the documentation and shows how to parse a date:

# from the documentation
# parsing a date
from parsy import string, regex
from datetime import date
ddmmyy = regex(r'[0-9]{2}').map(int).sep_by(string("-"), min=3, max=3).combine(
               lambda d, m, y: date(2000 + y, m, d))
ddmmyy.parse('06-05-14')
The pyparsing module is an alternative approach to creating and executing simple grammars vs. the traditional lex/yacc approach or the use of regular expressions. The pyparsing module provides a library of classes that client code uses to construct the grammar directly in Python code.

Pyparsing is a stable and mature software that was developed for more than 14 years and that has many examples, but it's still confusing and lacks documentation. While Pyparsing is equally as powerful as a traditional parser combinator, it works a bit differently and this lack of proper documentation makes it frustrating.

However, if you take the time to learn it on its own, the following example shows that it can be easy to use:

# example from the documentation
# define grammar
greet = Word( alphas ) + "," + Word( alphas ) + "!"

# input string
hello = "Hello, World!"

# parse input string
print hello, "->", greet.parseString( hello )

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,python ,parsing ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}