Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Regular Expressions Examples

DZone's Guide to

Regular Expressions Examples

Regular expressions are widely used by developers all over the world. See some common examples of REs and how you can create your own expressions from scratch.

· Database Zone
Free Resource

Learn how to create flexible schemas in a relational database using SQL for JSON.

Regular Expressions provide a powerful method to handle parsing tasks while handling texts. Whether the problem is a simple search for a pattern, or splitting a string into components, regular expressions are widely used today.

Let's learn how to build regular expressions for matching some common patterns. The purpose of this article is to teach you how to build regex patterns so you can understand and build your own.

Before we begin building regular expressions, let's review the basics a bit. There are many excellent regular expression guides out on the Internet. Wikipedia provides a general outline of regular expressions including the history. Mozilla’s site provides a regular expression reference which applies to regexes in Javascript. And Java’s Pattern class describes regex patterns as used in Java.

Floating Point Numbers

We will now attempt to build a regular expression for matching floating point numbers in a variety of formats. In the output below, we see an x in the first column when the input is not matched. Let's now gradually build the regex to match the input shown.

An decimal integer can easily be matched with a regular expression of the form \d+. This is suitable for a number without a fractional part or a preceding sign.

Regex: \d+
  |  1 |2389
x |  2 |3.14
x |  3 |23.
x |  4 |45.0
x |  5 |.388
x |  6 |0.564
x |  7 |278e-8
x |  8 |399e4

Fractional Part

Let's update for handling the fractional part. We append \.\d+, which matches a decimal point followed by numbers.

Regex: \d+\.\d+
x |  1 |2389
  |  2 |3.14
x |  3 |23.
  |  4 |45.0
x |  5 |.388
  |  6 |0.564
x |  7 |278e-8
x |  8 |399e4

To handle plain integers, let us make the fractional part optional by using the construct (\.\d+)?.

Regex: \d+(\.\d+)?
  |  1 |2389
  |  2 |3.14
x |  3 |23.
  |  4 |45.0
x |  5 |.388
  |  6 |0.564
x |  7 |278e-8
x |  8 |399e4

How about making the fractional digits optional? After all, 32. is still a valid floating point number!

Regex: \d+(\.\d*)?
  |  1 |2389
  |  2 |3.14
  |  3 |23.
  |  4 |45.0
x |  5 |.388
  |  6 |0.564
x |  7 |278e-8
x |  8 |399e4

Looks like we need to make the integer part optional, too.

Regex: (\d+)?(\.\d*)?
  |  1 |2389
  |  2 |3.14
  |  3 |23.
  |  4 |45.0
  |  5 |.388
  |  6 |0.564
x |  7 |278e-8
x |  8 |399e4
  |  9 |

However, if both the integer and fractional part are optional, then our regex will match an empty string — which is not good at all! So let's make one of them mandatory. Here is one way to achieve that.

Regex: (\d+(\.\d*)?|(\d+)?\.\d+)
  |  1 |2389
  |  2 |3.14
  |  3 |23.
  |  4 |45.0
  |  5 |.388
  |  6 |0.564
x |  7 |278e-8
x |  8 |399e4
x |  9 |

Exponents

Looks like we almost got it. Let's now update the regex for covering the exponent part. We append (e[-+]?\d+)? to the above expression.

  |  1 |2389
  |  2 |3.14
  |  3 |23.
  |  4 |45.0
  |  5 |.388
  |  6 |0.564
  |  7 |278e-8
  |  8 |399e4
x |  9 |

Optional Sign

And that covers most of it. Let's now update for an optional preceding sign. We prepend [-+]? to the regex.

  |  1 |2389
  |  2 |3.14
  |  3 |23.
  |  4 |45.0
  |  5 |.388
  |  6 |0.564
  |  7 |278e-8
  |  8 |399e4
  |  9 |-69
  | 10 |+44.474774e-499
x | 11 |

And here is the complete regex for matching a floating point number: [-+]?(\d+(\.\d*)?|(\d+)?\.\d+)(e[-+]?\d+)?

Phone Numbers

Next up in the list of commonly used regex pattern is the phone number. Let us start with a US-style phone number matched using \d+.

Regex: \d+
  |  1 |4159892626
x |  2 |(823)383-2245
x |  3 |377-333-3459
x |  4 |945 330-3322
x |  5 |447.332.4455
  |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402

Let's make arrangements for separators between the area code and the number.

Regex: (\d{3})[-. ](\d{3})[-. ](\d{4})
x |  1 |4159892626
x |  2 |(823)383-2245
  |  3 |377-333-3459
  |  4 |945 330-3322
  |  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402

Oops! No longer matches a number without separators. Let us make the separators optional.

Regex: (\d{3})[-. ]?(\d{3})[-. ]?(\d{4})
  |  1 |4159892626
x |  2 |(823)383-2245
  |  3 |377-333-3459
  |  4 |945 330-3322
  |  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402

How about accounting for enclosing the area code in parentheses?

Regex: \(?(\d{3})[\)-. ]?(\d{3})[-. ]?(\d{4})
  |  1 |4159892626
  |  2 |(823)383-2245
  |  3 |377-333-3459
  |  4 |945 330-3322
  |  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402

Let's now take care of the country code, possibly preceded by a +.

  |  1 |4159892626
  |  2 |(823)383-2245
  |  3 |377-333-3459
  |  4 |945 330-3322
  |  5 |447.332.4455
  |  6 |18008982334
x |  7 |1-800-238-4767
  |  8 |+14669374402

Looks like we need to update the separator between the country code and the area code.

Regex: (\+?\d+)?[\(-.]?(\d{3})[\)-. ]?(\d{3})[-. ]?(\d{4})
  |  1 |4159892626
  |  2 |(823)383-2245
  |  3 |377-333-3459
  |  4 |945 330-3322
  |  5 |447.332.4455
  |  6 |18008982334
  |  7 |1-800-238-4767
  |  8 |+14669374402
  |  9 |+1.800.399.3378

A note about this regex for parsing phone numbers: It accepts phone numbers of the form 1-800)456 2334, which might be characterized as ugly if not downright wrong. Correcting for these cases would make the regex more complex, so maybe it is better to cover 90% of the cases and not worry about these edge cases.

Email Addresses

An email address has the general form: name@company.com. Let's start with this form and progressively enhance it.

Regex: \w+@\w+\.\w+
  |  1 |j@x.org
  |  2 |abc@joe.com
x |  3 |abc@joe-blow.org

The first problem is that characters like + and   are valid within names (both before the @ and after).

Regex: [-\+\w]+@[-+\w]+\.[-+\w]+
  |  1 |j@x.org
  |  2 |abc@joe.com
  |  3 |abc@joe-blow.org

Next, the domain name part can have two or three components separated by a period (.). Here is one way to take care of it:

Regex: [-\+\w]+@[-+\w]+\.[-+\w]+(\.[-+\w]+)?
  |  1 |j@x.org
  |  2 |abc@joe.com
  |  3 |abc@joe-blow.org
  |  4 |abc+def@joe.co.uk

We have one more issue: The name can include a period. Update the part before the @ to reflect this condition.

Regex: [-\+\.\w]+@[-+\w]+\.[-+\w]+(\.[-+\w]+)?
  |  1 |j@x.org
  |  2 |abc@joe.com
  |  3 |abc@joe-blow.org
  |  4 |abc+def@joe.co.uk
  |  5 |abc.d+efg@joe.com

Simple HTML

Parsing HTML is complex and not possible entirely with regular expressions. It is better to use a library suitable for the programming language to accomplish this task. Having said that, there are a few situations where using a regex to parse HTML might be useful. The code below provides a solution with the following caveats.

  • Does not handle common HTML errors such as missing starting or ending tags.
  • XML type start-tag only (ex: <br/>) is not accepted.
  • These characters must be escaped properly: <>, and &.

Let us start with a few simple situations and enhance the regex.

Regex: <(\w+)>[^<]*</\1>
  |  1 |<p>Hello world</p>

This does not accept attributes in the start tag. Not exactly HTML, eh? Well, let's cover that case.

Regex: <(\w+)[^>]*>[^<]*</\1>
  |  1 |<p>Hello world</p>
  |  2 |<a href="http://www.google.com">Google it!</a>

The regex for the content between the HTML tags is unnecessarily strict. As it stands, the regex does not allow nested HTML such as <p>Hello <b>there</b></p>. So let's fix it.

Regex: <(\w+)[^>]*>.*</\1>
  |  1 |<p>Hello world</p>
  |  2 |<a href="http://www.google.com">Google it!</a>
  |  3 |<p>Hello <b>world</b></p>

This is about the extent of the HTML that can be parsed with a simple regex.

Conclusion

We covered some regular expression samples for common use cases including parsing for floating point numbers, phone numbers, email addresses, and HTML code, with an emphasis on learning how to write regex patterns rather than handing out ready-made recipes.

Create flexible schemas using dynamic columns for semi-structured data. Learn how.

Topics:
regular expressions ,tutorial ,database ,regex patterns

Published at DZone with permission of Jay Sridhar, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}