Parsing in Python: Tools and Libraries (Part 4)
Parsing in Python: Tools and Libraries (Part 4)
In Part 4 of this 8-part series, we will look at three types of context-free parser generators: PLY, PlyPlus, and Pyleri.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Before continuing, be sure to check out Part 3 here!
PLY doesn’t try to do anything more or less than provide basic lex/yacc functionality. In other words, it’s not a large parsing framework or a component of some larger system.
PLY is a stable and maintained tool with a long history, starting in 2001. It is also quite basic, given that there are no tools for the automatic creation of AST or anything that a C developer of the previous century would define as fancy stuff. It was primarily created as an instructional tool. This explains its simplicity as well as the reason it offers great support for diagnostics and catching mistakes in the grammar.
A PLY grammar is written in Python code in a BNF-like format. Lexer and parser functions can be used separately. The following example shows only the lexer, but the parser works in the same way.
import ply.lex as lex # List of token names. This is always required tokens = ( 'NUMBER', 'PLUS', 'MINUS', 'TIMES', 'DIVIDE', 'LPAREN', 'RPAREN', ) # Regular expression rules for simple tokens t_PLUS = r'\+' t_MINUS = r'-' t_TIMES = r'\*' t_DIVIDE = r'/' t_LPAREN = r'\(' t_RPAREN = r'\)' # A regular expression rule with some action code def t_NUMBER(t): r'\d+' t.value = int(t.value) return t # Define a rule so we can track line numbers def t_newline(t): r'\n+' t.lexer.lineno += len(t.value) # A string containing ignored characters (spaces and tabs) t_ignore = ' \t' # Error handling rule def t_error(t): print("Illegal character '%s'" % t.value) t.lexer.skip(1) # Build the lexer lexer = lex.lex()
The documentation is extensive and clear, with abundant examples and explanations of parsing concepts — all that you need if you can get past the '90s looks.
There is a port for RPython called RPLY.
Plyplus is a general-purpose parser built on top of PLY (LALR(1)) and written in Python. Plyplus features a modern design and focuses on simplicity without losing power.
PlyPlus is a tool that is built on top of PLY, but it is very different from it. The authors and the way the names are written are different. Compared to its father, the documentation is lacking, but the features are many.
You can write a grammar in a
.g file or in a string, but it is always generated dynamically. The format is based on EBNF but a grammar can also include special notations to simplify the creation of an AST. This notation allows you to exclude or drop certain rules from the generated tree.
// from the documentation start: add; // Rules ?add: (add add_symbol)? mul; ?mul: (mul mul_symbol)? atom; // rules preceded by @ will not appear in the tree @atom: neg | number | '\(' add '\)'; neg: '-' atom; // Tokens number: '[\d.]+'; mul_symbol: '\*' | '/'; add_symbol: '\+' | '-'; WS: '[ \t]+' (%ignore);
PlyPlus includes a function to draw an image of a parse tree based on pydot and graphviz. PlyPlus has a unique feature, too: it allows you to select nodes in the AST using selectors similar to the CSS selectors used in web development. For instance, if you want to fill all terminal nodes that contain the letter "n," you can find them like this:
// from the documentation >>> x.select('/.*n.*/:is-leaf') ['Popen', 'isinstance', 'basestring', 'stdin']
This is a unique feature that can be useful, for example, if you are developing a static analysis or refactoring tool.
A grammar for Pyleri must be defined in a Python expression as part of a class. Once defined, the grammar can be exported as a file defining the grammar in Python or any other supported language. Apart from this peculiarity, Pyleri is a simple and easy-to-use tool.
Pyleri example in Python:
# from the documentation # Create a Grammar Class to define your language class MyGrammar(Grammar): r_name = Regex('(?:"(?:[^"]*)")+') k_hi = Keyword('hi') START = Sequence(k_hi, r_name) # Compile your grammar by creating an instance of the Grammar Class. my_grammar = MyGrammar() # Use the compiled grammar to parse 'strings' print(my_grammar.parse('hi "Iris"').is_valid) # => True print(my_grammar.parse('bye "Iris"').is_valid) # => False
Published at DZone with permission of Gabriele Tomassetti , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.