# Regular Expressions Examples

### Regular expressions are widely used by developers all over the world. See some common examples of REs and how you can create your own expressions from scratch.

· Database Zone · Tutorial
Save
3.83K Views

Regular Expressions provide a powerful method to handle parsing tasks while handling texts. Whether the problem is a simple search for a pattern, or splitting a string into components, regular expressions are widely used today.

Let's learn how to build regular expressions for matching some common patterns. The purpose of this article is to teach you how to build regex patterns so you can understand and build your own.

Before we begin building regular expressions, let's review the basics a bit. There are many excellent regular expression guides out on the Internet. Wikipedia provides a general outline of regular expressions including the history. Mozilla’s site provides a regular expression reference which applies to regexes in Javascript. And Java’s Pattern class describes regex patterns as used in Java.

## Floating Point Numbers

We will now attempt to build a regular expression for matching floating point numbers in a variety of formats. In the output below, we see an x in the first column when the input is not matched. Let's now gradually build the regex to match the input shown.

An decimal integer can easily be matched with a regular expression of the form `\d+`. This is suitable for a number without a fractional part or a preceding sign.

``````Regex: \d+
|  1 |2389
x |  2 |3.14
x |  3 |23.
x |  4 |45.0
x |  5 |.388
x |  6 |0.564
x |  7 |278e-8
x |  8 |399e4``````

### Fractional Part

Let's update for handling the fractional part. We append `\.\d+`, which matches a decimal point followed by numbers.

``````Regex: \d+\.\d+
x |  1 |2389
|  2 |3.14
x |  3 |23.
|  4 |45.0
x |  5 |.388
|  6 |0.564
x |  7 |278e-8
x |  8 |399e4``````

To handle plain integers, let us make the fractional part optional by using the construct `(\.\d+)?`.

``````Regex: \d+(\.\d+)?
|  1 |2389
|  2 |3.14
x |  3 |23.
|  4 |45.0
x |  5 |.388
|  6 |0.564
x |  7 |278e-8
x |  8 |399e4``````

How about making the fractional digits optional? After all, `32.` is still a valid floating point number!

``````Regex: \d+(\.\d*)?
|  1 |2389
|  2 |3.14
|  3 |23.
|  4 |45.0
x |  5 |.388
|  6 |0.564
x |  7 |278e-8
x |  8 |399e4``````

Looks like we need to make the integer part optional, too.

``````Regex: (\d+)?(\.\d*)?
|  1 |2389
|  2 |3.14
|  3 |23.
|  4 |45.0
|  5 |.388
|  6 |0.564
x |  7 |278e-8
x |  8 |399e4
|  9 |``````

However, if both the integer and fractional part are optional, then our regex will match an empty string — which is not good at all! So let's make one of them mandatory. Here is one way to achieve that.

``````Regex: (\d+(\.\d*)?|(\d+)?\.\d+)
|  1 |2389
|  2 |3.14
|  3 |23.
|  4 |45.0
|  5 |.388
|  6 |0.564
x |  7 |278e-8
x |  8 |399e4
x |  9 |``````

### Exponents

Looks like we almost got it. Let's now update the regex for covering the exponent part. We append `(e[-+]?\d+)?` to the above expression.

``````  |  1 |2389
|  2 |3.14
|  3 |23.
|  4 |45.0
|  5 |.388
|  6 |0.564
|  7 |278e-8
|  8 |399e4
x |  9 |``````

### Optional Sign

And that covers most of it. Let's now update for an optional preceding sign. We prepend `[-+]?` to the regex.

``````  |  1 |2389
|  2 |3.14
|  3 |23.
|  4 |45.0
|  5 |.388
|  6 |0.564
|  7 |278e-8
|  8 |399e4
|  9 |-69
| 10 |+44.474774e-499
x | 11 |``````

And here is the complete regex for matching a floating point number: `[-+]?(\d+(\.\d*)?|(\d+)?\.\d+)(e[-+]?\d+)?`

## Phone Numbers

Next up in the list of commonly used regex pattern is the phone number. Let us start with a US-style phone number matched using `\d+`.

``````Regex: \d+
|  1 |4159892626
x |  2 |(823)383-2245
x |  3 |377-333-3459
x |  4 |945 330-3322
x |  5 |447.332.4455
|  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402``````

Let's make arrangements for separators between the area code and the number.

``````Regex: (\d{3})[-. ](\d{3})[-. ](\d{4})
x |  1 |4159892626
x |  2 |(823)383-2245
|  3 |377-333-3459
|  4 |945 330-3322
|  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402``````

Oops! No longer matches a number without separators. Let us make the separators optional.

``````Regex: (\d{3})[-. ]?(\d{3})[-. ]?(\d{4})
|  1 |4159892626
x |  2 |(823)383-2245
|  3 |377-333-3459
|  4 |945 330-3322
|  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402``````

How about accounting for enclosing the area code in parentheses?

``````Regex: \(?(\d{3})[\)-. ]?(\d{3})[-. ]?(\d{4})
|  1 |4159892626
|  2 |(823)383-2245
|  3 |377-333-3459
|  4 |945 330-3322
|  5 |447.332.4455
x |  6 |18008982334
x |  7 |1-800-238-4767
x |  8 |+14669374402``````

Let's now take care of the country code, possibly preceded by a `+`.

``````  |  1 |4159892626
|  2 |(823)383-2245
|  3 |377-333-3459
|  4 |945 330-3322
|  5 |447.332.4455
|  6 |18008982334
x |  7 |1-800-238-4767
|  8 |+14669374402``````

Looks like we need to update the separator between the country code and the area code.

``````Regex: (\+?\d+)?[\(-.]?(\d{3})[\)-. ]?(\d{3})[-. ]?(\d{4})
|  1 |4159892626
|  2 |(823)383-2245
|  3 |377-333-3459
|  4 |945 330-3322
|  5 |447.332.4455
|  6 |18008982334
|  7 |1-800-238-4767
|  8 |+14669374402
|  9 |+1.800.399.3378``````

A note about this regex for parsing phone numbers: It accepts phone numbers of the form `1-800)456 2334`, which might be characterized as ugly if not downright wrong. Correcting for these cases would make the regex more complex, so maybe it is better to cover 90% of the cases and not worry about these edge cases.

An email address has the general form: `name@company.com`. Let's start with this form and progressively enhance it.

``````Regex: \w+@\w+\.\w+
|  1 |j@x.org
|  2 |abc@joe.com
x |  3 |abc@joe-blow.org``````

The first problem is that characters like `+` and `–`  are valid within names (both before the `@` and after).

``````Regex: [-\+\w]+@[-+\w]+\.[-+\w]+
|  1 |j@x.org
|  2 |abc@joe.com
|  3 |abc@joe-blow.org``````

Next, the domain name part can have two or three components separated by a period (`.`). Here is one way to take care of it:

``````Regex: [-\+\w]+@[-+\w]+\.[-+\w]+(\.[-+\w]+)?
|  1 |j@x.org
|  2 |abc@joe.com
|  3 |abc@joe-blow.org
|  4 |abc+def@joe.co.uk``````

We have one more issue: The name can include a period. Update the part before the `@` to reflect this condition.

``````Regex: [-\+\.\w]+@[-+\w]+\.[-+\w]+(\.[-+\w]+)?
|  1 |j@x.org
|  2 |abc@joe.com
|  3 |abc@joe-blow.org
|  4 |abc+def@joe.co.uk
|  5 |abc.d+efg@joe.com``````

## Simple HTML

Parsing HTML is complex and not possible entirely with regular expressions. It is better to use a library suitable for the programming language to accomplish this task. Having said that, there are a few situations where using a regex to parse HTML might be useful. The code below provides a solution with the following caveats.

• Does not handle common HTML errors such as missing starting or ending tags.
• XML type start-tag only (ex: `<br/>`) is not accepted.
• These characters must be escaped properly: `<``>`, and `&`.

Let us start with a few simple situations and enhance the regex.

``````Regex: <(\w+)>[^<]*</\1>
|  1 |<p>Hello world</p>``````

This does not accept attributes in the start tag. Not exactly HTML, eh? Well, let's cover that case.

``````Regex: <(\w+)[^>]*>[^<]*</\1>
|  1 |<p>Hello world</p>

The regex for the content between the HTML tags is unnecessarily strict. As it stands, the regex does not allow nested HTML such as `<p>Hello <b>there</b></p>`. So let's fix it.

``````Regex: <(\w+)[^>]*>.*</\1>
|  1 |<p>Hello world</p>
|  3 |<p>Hello <b>world</b></p>``````

This is about the extent of the HTML that can be parsed with a simple regex.

## Conclusion

We covered some regular expression samples for common use cases including parsing for floating point numbers, phone numbers, email addresses, and HTML code, with an emphasis on learning how to write regex patterns rather than handing out ready-made recipes.

Topics:
regular expressions, tutorial, database, regex patterns

Published at DZone with permission of Jay Sridhar, DZone MVB.

Opinions expressed by DZone contributors are their own.