DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • CockroachDB TIL: Volume 11
  • JSON-Based Serialized LOB Pattern
  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations

Trending

  • The Role of Functional Programming in Modern Software Development
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Customer 360: Fraud Detection in Fintech With PySpark and ML
  • My LLM Journey as a Software Engineer Exploring a New Domain
  1. DZone
  2. Data Engineering
  3. Databases
  4. String Operations on Phone Numbers in SQL

String Operations on Phone Numbers in SQL

Most DBAs and developers need to not only capture phone numbers, but make them usable. Formatting can make this a tricky task. Let’s look at how to solve it

By 
Shanika WIckramasinghe user avatar
Shanika WIckramasinghe
·
Dec. 15, 21 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
6.0K Views

Join the DZone community and get the full member experience.

Join For Free

Article Image

When dealing with SQL databases, users will have to interact with different data types and schemas depending on the project's requirements. Since databases are used in many different applications and support multiple platforms, devices, and users, there may be differences in how data is entered, even in a simple field like a phone number.

For example, one set of users will enter the phone number with the country code, while the others may simply ignore the country code. Therefore, it is essential for database users to know how to clean data in a SQL database. In this post, we will see how to utilize SQL string operations on phone number fields to properly clean, format, and extract data.

Why Do We Care About Data Cleaning?

Data comes in all forms and sizes. Most of the time, it will be raw data, without proper formatting. Data cleaning allows users to filter, sort, and format this raw data to extract meaningful information from it. Data cleaning also makes data analysis a much more efficient process as an additional benefit.

Let's assume we have a text field that captures user information and asks users to enter valuable information like phone numbers. We need to clean this text field to make the phone numbers usable. How would we go about doing that? To begin with, we need to understand SQL string operations.

What Are SQL String Operations?

String operations are functions that can be used to manipulate strings. These string operators (also called string functions) take a string as input, process it according to the specified function, and return a string as the output. The availability of these functions will differ depending on the database. For example, the following is a list of commonly used string functions in MySQL:

Common Types of String Operations

Function

Usage

LEFT / RIGHT

Extract a specified number of characters from a string (starting from left or right)

TRIM

Remove leading or trailing spaces

POSITION / STRPOS

Analogous to the Location function, where a numerical value of the position of the specified substring is provided.

SUBSTR

Extract the specified substring from a specific location to match the specified length

CONCAT

Combine two or more strings to make a single string

COALESCE

Replace null values with a specified value

FORMAT

Formats the value to the specified format

REGEXP

Matches string to a regular expression

REGEXP_SUBSTR

Extracts value for the matching regular expression

STR

Convert numerical value to string

REVERSE

Reverses the specified string

REPLACE

Replaces the specified string with the given value.

The full list of available string functions and operators is available in the official documentation.

How To Use String Operations With Phone Numbers

Now we have a basic understanding of string operations and related functions in SQL! So, let's look at how exactly we can utilize some of these functions on phone numbers to extract and format data. For all the examples provided here, we will be using the Arctype SQL client with a MySQL database in a Windows environment with the following dataset.

Screenshot - 1

Properly Format a Phone Number

We can utilize the SUBSTR command to format phone numbers. It will extract parts of the number and use the CONCAT to join the specific parts together. In the following example, we have broken down the phone number into three distinct sections and combined them together with the formatting as a new field called num_formatted:

SELECT
    phone_num,
    CONCAT(
        '(',
        SUBSTR(phone_num, 1, 3),
        ') ',
        SUBSTR(phone_num, 4, 3),
        '-',
        SUBSTR(phone_num, 7)
    ) AS num_formatted
FROM
    EmpDetails


The result of the operation is shown below:

Screenshot - 2

You can create a new column to save the formatted number using the ALTER TABLE command. It will create a new column and update it with the formatted numbers.

ALTER TABLE EmpDetails
ADD formatted_number VARCHAR(255);
    
UPDATE
    EmpDetails
SET
    formatted_number = (
        SELECT
            CONCAT(
                '(',
                SUBSTR(phone_num, 1, 3),
                ') ',
                SUBSTR(phone_num, 4, 3),
                '-',
                SUBSTR(phone_num, 7)
            ) AS num_formatted
    )


The result of the above operation is shown below:

Screenshot - 3

Screenshot - 4

We can modify our SQL command to include the REPLACE function if there are additional characters in the phone number field, such as dashes and brackets. This function can be used to remove unnecessary characters and whitespaces in a string. Now let's further format the formmated_number field to include the country code.

SELECT
    formatted_number,
    CONCAT(
        '+1 '
        '(',
        SUBSTR(num_cleansed, 1, 3),
        ') ',
        SUBSTR(num_cleansed, 5, 3),
        '-',
        SUBSTR(num_cleansed, 7)
    ) AS num_formatted
FROM
    (
        SELECT
            formatted_number,
            REPLACE(REPLACE(REPLACE(REPLACE(formatted_number, '(', ''), ')', ''), '-', ''), ' ','') as num_cleansed
        FROM
            EmpDetails
    ) FormattedPhoneNumbers


In the above statement, the REPLACE function is used to remove the brackets, dashes, and whitespaces from the formatted_number field before undergoing the formatting process.

Screenshot - 5

Extracting Parts of a Phone Number

We can use the LEFT and RIGHT functions to specify and extract different parts of the phone number. In the following example, we will extract the area code of the phone number using the LEFT function and the last four digits using the RIGHT function based on the num_formatted field.

SELECT
    REPLACE(LEFT(formatted_number, 4), '(', '') As 'Area Code',
    RIGHT(formatted_number, 4) AS 'Last Four Digits'
FROM
    EmpDetails


Here, the REPLACE function is used to remove the bracket from the selected number block. The result of the query is shown below.

Screenshot - 6

Extracting Phone Numbers from a Text Block

One of the most common tasks in the world of data cleaning is extracting phone numbers from a text block. The complexity and feasibility of this task will mostly depend on the composition of the text.

Extracting a Phone Number With Regular Expressions

The easiest way to extract phone numbers is to utilize regular expressions to target the specific phone number formats. Extracting data has become far simpler with the introduction of functions like REGEXP_SUBSTR in MySQL 8.0. We will be populating the details column with some phone numbers in different formats, as shown below.

Screenshot - 7

Identifying Rows with Matching Phone Numbers

First, we need to figure out which rows consist of data matching our requirements. In this case, the following regular expressions will be used on the details field.

  • Any consecutive 10 digits
  • Formatted number (XXX-XXX-XXXX)
SELECT
    *
FROM
    EmpDetails
WHERE
    # Any 10 Digits
    details REGEXP '[[:digit:]]{10}'
    # Formatted Number (XXX-XXX-XXXX)
    OR details REGEXP '[0-9]{3}-[0-9]{3}-[0-9]{4}';

Results of the query can be seen below:

Screenshot - 8

Extracting the Phone Numbers

Since we have identified the rows, the next step is to extract the phone numbers. It can be done using the REGEXP_SUBSTR function to extract the substring which matches the given regular expression. As we need to query two different regular expressions, we will be using the CONCAT_WS function to combine the results of both expressions into a single column.

SELECT
    emp_id,
    name,
    email,
    CONCAT_WS(
        '',
        REGEXP_SUBSTR(details, '^[0-9]+$', 1, 1, 'm'),
        REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, 1, 'm')
    ) AS extracted_phone_numbers
FROM
    (
        SELECT
            *
        FROM
            EmpDetails
        WHERE
            details REGEXP '[[:digit:]]{10}'
            OR details REGEXP '[0-9]{3}-[0-9]{3}-[0-9]{4}'
) ExtractedDetails


The result of this operation can be seen below:

Handling Multiple Phone Numbers in a Single Field

To query results from a single field with multiple numbers, we need to create a stored procedure that loops through the desired field to capture all matching regex patterns. For instance, let's see how to extract multiple phone numbers from the details field of emp_id 1702 (Dan).

DELIMITER $$
CREATE PROCEDURE get_number_of_matches_full()
BEGIN
  DECLARE regex_match INT DEFAULT 1;
  DECLARE current_number INT DEFAULT 1;
  WHILE regex_match >= 1 DO
    CREATE TABLE IF NOT EXISTS extracted_numbers_table (
        `emp_id` int NOT NULL,
        `name` varchar(255) DEFAULT NULL,
        `email` varchar(255) DEFAULT NULL,
        `extracted_phone_num` varchar(255) DEFAULT NULL
    );
    INSERT INTO extracted_numbers_table (emp_id, name, email, extracted_phone_num)
    SELECT emp_id, name, email, REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, current_number, 'm') FROM EmpDetails WHERE emp_id = 1702;
    SET current_number = current_number + 1;
    IF ((SELECT REGEXP_SUBSTR(details, '[0-9]{3}-[0-9]{3}-[0-9]{4}', 1, current_number, 'm') FROM EmpDetails WHERE emp_id = 1702) IS NULL) THEN
        SET regex_match = 0;
    END IF;
  END WHILE;
END $$
    
DELIMITER;


The result of this operation is shown below.

Screenshot - 9

In the above code block, we have created a stored procedure called get_number_of_matches_full, which loops through the details field until all the regex matches are found on the specified row of the EmpDetails table. We are using the REGEXP_SUBSTR function with the position argument to extract different matches. It updates by one at each loop run and saves the results on a newly created extracted_numbers_table. Finally, the loop exits when a NULL result is found.

We can call this procedure and view the results by querying the extracted_numbers_table as shown below.

CALL get_number_of_matches_full;

SELECT * FROM extracted_numbers_table;


And, shown again in Arctype:

Screenshot - 10

Conclusion

String operations in SQL are vital functions to clean and format data within a database. Moreover, string operations are the core component when dealing with valuable information such as phone numbers as they allow users to extract or manipulate phone numbers according to their requirements. However, it's important to remember that the exact functions available will change depending on the database type and version. 

Therefore, always remember to refer to the documentation to find out the available string operations and implement them to your heart's content.

Strings Data Types sql MySQL Database

Published at DZone with permission of Shanika WIckramasinghe. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • CockroachDB TIL: Volume 11
  • JSON-Based Serialized LOB Pattern
  • SQL Commands: A Brief Guide
  • Non-blocking Database Migrations

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!