Over a million developers have joined DZone.

Create Regular Expressions in XML With the Regexml Open Source Library

DZone's Guide to

Create Regular Expressions in XML With the Regexml Open Source Library

Free Resource

Regular expressions are great at parsing portions of text out of a string or determining whether text matches a specific pattern. However, this power comes at a cost. Regular expressions can be very complex to write, hard to document, and difficult to understand. The Regexml project provides a simple way to define and document complex regular expressions in XML. For example, this simple Regexml expression defines a zip (postal) code:

<regexml xmlns="http://schemas.regexml.org/expressions">
<expression id="zipcode">
<match equals="\d" min="5" capture="true"/> <!-- 5 digit zip code -->
<group min="0">
<match equals="-"/>
<match equals="\d" min="4" capture="true"/> <!-- optional "plus 4" -->

After consuming this XML, the Regexml library creates and caches a standard java.util.regex.Pattern object that can be used to parse data out of text or determine if the text matches a pattern. The capture attribute in the XML above indicates the portions of the text that should be parsed out and made available to the client application. The equivalent regular expression looks like this:


Though the traditional regular expression is far shorter, it's brevity and cryptic symbols make it harder to read and understand. Of course, for simple expressions like this one, a traditional regular expression specified in the code may be most appropriate. However, as expressions become more complex, the ability to document them in-line, employ whitespace to show hierarchy, and use expressive attributes rather than symbols can simplify maintenance and debugging. For more information about the open source Regexml project, see the overview and comprehensive introduction at:



Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}