Revisiting Regular Expression
This article shows the versatility of regular expression in our day-to-day life and some useful references.
Join the DZone community and get the full member experience.Join For Free
A regular expression is nothing but a sequence of characters that define a pattern which then can be used to filter or match information. This article is not a tutorial on regular expression but shows the versatility of regular expression in our day-to-day life and some useful references.
Data is everywhere and is the new oil for modern generation. Data can be broadly categorized as binary data such as (mp3 file, png file, etc.) which in most cases is not human-readable, whereas text data like emails, notes, etc. are more human-friendly. Regular expression is well suited for searching for content in text data.
Making sense of the data is the heart of data mining or data science. Regular expressions are used to identify whether a pattern exists in a given sequence of characters or not and also to locate the position of the pattern in the whole text. They help in manipulating textual data and get the insight of the data. Tools like Openrefine uses regular expression heavily to trim and clean data.
It is also interesting to note that some of the very common task that we do in a computer uses regular expressions internally like,
- Input Validation: Most commonly seen in webpages for validating fields like email, password etc.
- Text Editor: Most editors, like Sublime or Visual Studio Code etc allows us to search file contents based on regular expressions.
- Linux Terminals: Linux terminals and commands uses regular expression heavily. Commands like grep, find, less uses regex to find files and contents.
- SQL Database: Databases like MySQL and Oracle supports regular expression for matching rows.
SELECT name FROM student_tbl WHERE name REGEXP '^sa'
- NoSQL Database: NoSQL Databases like MongoDB supports regular expression for string pattern matching using the $regex operator.
- Data Cleaning: Tools like Openrefine and Google Refine Expression Language (GREL) uses regular expression to clean data.
Regular Expression Engines
One thing to note is that not all programming language supports regular expressions in the same way. There are multiple regular expression engines like (PCRE, RE2) and programming languages uses or implements one of these engines, as a result, there is a slight difference in syntax and performance across different languages. This wikipedia page has a nice comparison table.
The following are some of the popular programming languages that support regular expression.
Java: Java supports Regular Expression in the library java.util.regex package.
Python: Popular high-level scripting language with a comprehensive built-in regular expression library
Ruby: Ruby has comprehensive regular expression support as a language feature.
PHP: PHP also has comprehensive support for regular expression.
Perl: Perl is great for text processing and it supports regular expression as a language feature.
The following are some useful online resources:
regexone: Oline interactive tutorial
jex.im: Regular Expression visualizer
regexr.com: Online web page for testing regular expressions
Opinions expressed by DZone contributors are their own.