Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Regex That Broke a Server

DZone's Guide to

The Regex That Broke a Server

· Java Zone
Free Resource

Get the Edge with a Professional Java IDE. 30-day free trial.

I’ve never thought I would see an unresponsive server due to a bad regex matcher but that’s just happened to one of our services, yielding it it unresponsive.

Let’s assume we parse some external dealer car info. We are trying to find all those cars with “no air conditioning” among various available input patterns (but without matching patterns such as “mono air conditioning”).

The regex that broke our service looks like this:

String TEST_VALUE = "ABS, traction control, front and side airbags, Isofix child seat anchor points, no air conditioning, electric windows, \r\nelectrically operated door mirrors";
double start = System.nanoTime();
Pattern pattern = Pattern.compile("^(?:.*?(?:\\s|,)+)*no\\s+air\\s+conditioning.*$");
assertTrue(pattern.matcher(TEST_VALUE).matches());
double end = System.nanoTime();
LOGGER.info("Took {} micros", (end - start) / (1000 ));

After 2 minutes this test was still running and one CPU core was fully overloaded.

First, the matches method uses the entire input data, so we don’t need the start(^) or the end($) delimiters, and because of the new line characters in the input string we must instruct our Regex Pattern to operate in a MULTILINE mode:

Pattern pattern = Pattern.compile("(?:.*?(?:\\s|,)+)*no\\s+air\\s+conditioning.*?", Pattern.MULTILINE);

Let’s see how multiple versions of this regex behave:

REGEX DURATION [MICROSECONDS] OBSERVATION
“(?:.*?(?:\\s|,)+)*no\\s+air\\s+conditioning.*?” 35699.334 This is way too slow
“(?:.*?(?:\\s|,)+)?no\\s+air\\s+conditioning.*?” 108.686 The non-capturing group doesn’t need the one-or-many(+) multiplier, so we can replace it with zero-or-one(?)
“(?:.*?\\b)?no\\s+air\\s+conditioning.*?” 153.636 It works for more input data than the previous one, which only uses the space(\s) and the comma(,) to separate the matched pattern
“\\bno\\s+air\\s+conditioning” 78.831 Find is much faster thanmatches and we are only interested in the first occurrence of this pattern.

If you enjoy reading this article, you might want to subscribe to my newsletter and get a discount for my book as well.

Vlad Mihalcea's Newsletter

Why not using String.indexOf() instead?

While this would be much faster than using regex, we would still have to consider the start of the string, patterns such as “mono air conditioning”, tabs or multiple space characters between our pattern tokens. Custom implementations as such may be faster, but are less flexible and take more time to implement.

If you enjoyed this article, I bet you are going to love my book as well.






Conclusion

Regex is a fine tool for pattern matching, but you must not take it for granted since small changes may yield big differences. The reason why the first regex was counterproductive is due to catastrophic backtracking, a phenomenon that every developer should be aware of before starting writing regular expressions.




Get the Java IDE that understands code & makes developing enjoyable. Level up your code with IntelliJ IDEA. Download the free trial.

Topics:
java ,high-perf ,server-side ,regex

Published at DZone with permission of Vlad Mihalcea. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}