Parsing in Python: Tools and Libraries (Part 4)

DZone 's Guide to

Parsing in Python: Tools and Libraries (Part 4)

In Part 4 of this 8-part series, we will look at three types of context-free parser generators: PLY, PlyPlus, and Pyleri.

· Big Data Zone ·
Free Resource

Before continuing, be sure to check out Part 3 here!

Parser Generators



PLY doesn’t try to do anything more or less than provide basic lex/yacc functionality. In other words, it’s not a large parsing framework or a component of some larger system.

PLY is a stable and maintained tool with a long history, starting in 2001. It is also quite basic, given that there are no tools for the automatic creation of AST or anything that a C developer of the previous century would define as fancy stuff. It was primarily created as an instructional tool. This explains its simplicity as well as the reason it offers great support for diagnostics and catching mistakes in the grammar.

A PLY grammar is written in Python code in a BNF-like format. Lexer and parser functions can be used separately. The following example shows only the lexer, but the parser works in the same way.

import ply.lex as lex

# List of token names.   This is always required
tokens = (

# Regular expression rules for simple tokens
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_LPAREN  = r'\('
t_RPAREN  = r'\)'

# A regular expression rule with some action code
def t_NUMBER(t):
    t.value = int(t.value)    
    return t

# Define a rule so we can track line numbers
def t_newline(t):
    t.lexer.lineno += len(t.value)

# A string containing ignored characters (spaces and tabs)
t_ignore  = ' \t'

# Error handling rule
def t_error(t):
    print("Illegal character '%s'" % t.value[0])

# Build the lexer
lexer = lex.lex()

The documentation is extensive and clear, with abundant examples and explanations of parsing concepts — all that you need if you can get past the '90s looks.

There is a port for RPython called RPLY.


Plyplus is a general-purpose parser built on top of PLY (LALR(1)) and written in Python. Plyplus features a modern design and focuses on simplicity without losing power.

PlyPlus is a tool that is built on top of PLY, but it is very different from it. The authors and the way the names are written are different. Compared to its father, the documentation is lacking, but the features are many.

You can write a grammar in a .g file or in a string, but it is always generated dynamically. The format is based on EBNF but a grammar can also include special notations to simplify the creation of an AST. This notation allows you to exclude or drop certain rules from the generated tree.

Example calc.g:

// from the documentation
start: add;

// Rules
?add: (add add_symbol)? mul;
?mul: (mul mul_symbol)? atom;
// rules preceded by @ will not appear in the tree
@atom: neg | number | '\(' add '\)';
neg: '-' atom;

// Tokens
number: '[\d.]+';
mul_symbol: '\*' | '/';
add_symbol: '\+' | '-';

WS: '[ \t]+' (%ignore);

PlyPlus includes a function to draw an image of a parse tree based on pydot and graphviz. PlyPlus has a unique feature, too: it allows you to select nodes in the AST using selectors similar to the CSS selectors used in web development. For instance, if you want to fill all terminal nodes that contain the letter "n," you can find them like this:

// from the documentation
>>> x.select('/.*n.*/:is-leaf')
['Popen', 'isinstance', 'basestring', 'stdin']

This is a unique feature that can be useful, for example, if you are developing a static analysis or refactoring tool.


Python Left-Right Parser (Pyleri) is part of a family of similar parser generators for JavaScript, Python, C, and Go.

A grammar for Pyleri must be defined in a Python expression as part of a class. Once defined, the grammar can be exported as a file defining the grammar in Python or any other supported language. Apart from this peculiarity, Pyleri is a simple and easy-to-use tool.

Pyleri example in Python:

# from the documentation
# Create a Grammar Class to define your language
class MyGrammar(Grammar):
    r_name = Regex('(?:"(?:[^"]*)")+')
    k_hi = Keyword('hi')
    START = Sequence(k_hi, r_name)

# Compile your grammar by creating an instance of the Grammar Class.
my_grammar = MyGrammar()

# Use the compiled grammar to parse 'strings'
print(my_grammar.parse('hi "Iris"').is_valid) # => True
print(my_grammar.parse('bye "Iris"').is_valid) # => False
big data, parsing, python, tutorial

Published at DZone with permission of Gabriele Tomassetti , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}