DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • 5 Key Postgres Advantages Over MySQL
  • Java EE 6 Pet Catalog with GlassFish and MySQL
  • Common Mistakes to Avoid When Writing SQL Code
  • SQL Commands: A Brief Guide

Trending

  • AI-Driven Root Cause Analysis in SRE: Enhancing Incident Resolution
  • Mastering Advanced Aggregations in Spark SQL
  • Integration Isn’t a Task — It’s an Architectural Discipline
  • Integrating Model Context Protocol (MCP) With Microsoft Copilot Studio AI Agents
  1. DZone
  2. Data Engineering
  3. Databases
  4. Sample Data Generation With Built-In Database Capabilities

Sample Data Generation With Built-In Database Capabilities

Learn how to use special functions, hierarchical queries, and common table expressions to generate sample data in PostgreSQL, Oracle, MySQL, and SQL Server.

By 
Denis Magda user avatar
Denis Magda
DZone Core CORE ·
Nov. 10, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
4.1K Views

Join the DZone community and get the full member experience.

Join For Free

There are many ways to generate dummy data for a database. You can create a data generator, use a mock data service, or get a subset of your production data. Additionally, many databases offer built-in capabilities for producing synthetic data.

In this article, you’ll learn how to use special database functions, hierarchical queries, and recursive Common Table Expressions (CTEs) to generate a sample dataset in PostgreSQL, MySQL, Oracle, and SQL Server.

Sample Table

We'll use the following sample table to demonstrate various built-in data generation techniques:

SQL
 
create table sample(
  id int, 
  val text
);


The id column stores the record's identifier, while the val column holds some text value.

Once you learn how to generate mock data for the table with two columns, you'll be able to apply the techniques to tables with many more columns of various data types.

Generate Series Function in PostgreSQL

PostgreSQL offers a special generate_series function that can produce a set of numbers or date/time values for a specified range.

For instance, to generate a series of IDs from 1 to 5, you would use the following function call:

SQL
 
select id from generate_series(1, 5) as id;

+--+
|id|
+--+
|5 |
|4 |
|3 |
|2 |
|1 |
+--+


Afterward, you can insert the generated series into the sample table with this simple statement:

SQL
 
insert into sample(id)
select id from generate_series(1, 5) as id;


This query adds five records to the table, each with a unique identifier. However, the val column will still be null:

SQL
 
select * from sample order by id;

+--+----+
|id|val |
+--+----+
|1 |null|
|2 |null|
|3 |null|
|4 |null|
|5 |null|
+--+----+


To populate the val column with non-null values, you simply need to modify the data generation query as follows:

SQL
 
-- Delete previously generated records
delete from sample;

-- Generate 1000 records using the concat function to populate the `val` column
insert into sample(id,val)
select id, concat('val', id * 10) from generate_series(1, 1000) as id;


Finally, retrieve the first five records from the table to review the sample data:

SQL
 
select * from sample order by id limit 5;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Quick and simple. For more advanced data generation scenarios using the generate_series function, refer to this article.

Generate Series Again ... But in SQL Server

The latest versions of SQL Server have introduced support for the generate_series function. Consequently, whether you're using PostgreSQL or SQL Server, the experience of generating sample data is consistent.

To generate a series of numbers from 1 to 5, you would invoke the function as follows:

SQL
 
select value from generate_series(1,5);

+-----+
|value|
+-----+
|1    |
|2    |
|3    |
|4    |
|5    |
+-----+


Subsequently, execute the following command to create 1000 records in the sample table:

SQL
 
insert into sample(id,val)
select value,concat('val', value * 10) from generate_series(1,1000);


To check the generated data, query the first five records:

SQL
 
select * from sample order by id
offset 0 rows
fetch next 5 rows only;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


As demonstrated, the generate_series function is a powerful tool for data generation tasks. However, this function is not part of the SQL standard yet and may not be available in all database systems.

Hierarchical Queries in Oracle

Oracle is among the databases that do not support the generate_series function. However, the database community has devised many alternative methods for generating dummy data.

One popular method involves using hierarchical queries. For instance, the following hierarchical query can generate a series of records from 1 to 5:

SQL
 
select level from dual connect by level <= 5;

+-----+
|LEVEL|
+-----+
|1    |
|2    |
|3    |
|4    |
|5    |
+-----+


Internally, the query constructs a data tree structure, with the LEVEL pseudo-column indicating the depth of the tree, beginning with one for the root.

By using this hierarchical query with an insert statement, you can produce 1000 records for the sample table:

SQL
 
-- Oracle doesn't support the `text` data type,
-- requiring you to create the table this way
create table sample (id int, val varchar(10));

-- Generate 1000 records
insert into sample(id,val)
select level, concat('val', level * 10) from dual 
connect by level <= 1000;


The generated data will resemble what you've seen with PostgreSQL and SQL Server:

SQL
 
select * from sample order by id
offset 0 rows fetch next 5 rows only;

+--+-----+
|ID|VAL  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Recursive Common Table Expressions in MySQL

Like Oracle, MySQL does not support the generate_series function, which necessitates finding alternative methods for generating sample data.

One such method is using recursive common table expressions (CTEs). To illustrate, the following recursive query generates a series of numbers from 1 to 5:

SQL
 
with recursive seq as (
    select 1 as id union all select id +1 from seq where id < 5
)
select id from seq;

+--+
|id|
+--+
|1 |
|2 |
|3 |
|4 |
|5 |
+--+


Next, you can use this recursion together with the following insert statement to produce 1000 records:

SQL
 
insert into sample(id,val)
with recursive seq as (
    select 1 as id union all select id +1 from seq where id < 1000
)
select id, concat('val', id * 10) from seq;


Finally, to ensure the mock data has been properly generated, take a quick look at the first five records:

SQL
 
select * from sample order by id limit 5;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Keep Mastering Databases 

As you can see, relational databases are much more than just storage for your application data. They provide a broad set of capabilities, allowing you to perform various tasks right on the database end. One such task is sample data generation, which can sometimes be satisfied with built-in database features.

Stay tuned for more database-related content, and keep mastering your database skills.

In the meantime, you can watch this video to see how the discussed data generation techniques work in practice for much larger tables.


Dummy data MySQL Relational database Synthetic data sql PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • 5 Key Postgres Advantages Over MySQL
  • Java EE 6 Pet Catalog with GlassFish and MySQL
  • Common Mistakes to Avoid When Writing SQL Code
  • SQL Commands: A Brief Guide

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!