DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • 5 Key Postgres Advantages Over MySQL
  • Useful System Table Queries in Relational Databases
  • Java EE 6 Pet Catalog with GlassFish and MySQL
  • Common Mistakes to Avoid When Writing SQL Code

Trending

  • You Learned AI. So Why Are You Still Not Getting Hired?
  • Evaluating SOC Effectiveness Using Detection Coverage and Response Metrics
  • Bridging Gaps in SOC Maturity Using Detection Engineering and Automation
  • Stop Using Python for Your GenAI Apps, Use Go and Genkit Instead
  1. DZone
  2. Data Engineering
  3. Databases
  4. Sample Data Generation With Built-In Database Capabilities

Sample Data Generation With Built-In Database Capabilities

Learn how to use special functions, hierarchical queries, and common table expressions to generate sample data in PostgreSQL, Oracle, MySQL, and SQL Server.

By 
Denis Magda user avatar
Denis Magda
DZone Core CORE ·
Nov. 10, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
4.6K Views

Join the DZone community and get the full member experience.

Join For Free

There are many ways to generate dummy data for a database. You can create a data generator, use a mock data service, or get a subset of your production data. Additionally, many databases offer built-in capabilities for producing synthetic data.

In this article, you’ll learn how to use special database functions, hierarchical queries, and recursive Common Table Expressions (CTEs) to generate a sample dataset in PostgreSQL, MySQL, Oracle, and SQL Server.

Sample Table

We'll use the following sample table to demonstrate various built-in data generation techniques:

SQL
 
create table sample(
  id int, 
  val text
);


The id column stores the record's identifier, while the val column holds some text value.

Once you learn how to generate mock data for the table with two columns, you'll be able to apply the techniques to tables with many more columns of various data types.

Generate Series Function in PostgreSQL

PostgreSQL offers a special generate_series function that can produce a set of numbers or date/time values for a specified range.

For instance, to generate a series of IDs from 1 to 5, you would use the following function call:

SQL
 
select id from generate_series(1, 5) as id;

+--+
|id|
+--+
|5 |
|4 |
|3 |
|2 |
|1 |
+--+


Afterward, you can insert the generated series into the sample table with this simple statement:

SQL
 
insert into sample(id)
select id from generate_series(1, 5) as id;


This query adds five records to the table, each with a unique identifier. However, the val column will still be null:

SQL
 
select * from sample order by id;

+--+----+
|id|val |
+--+----+
|1 |null|
|2 |null|
|3 |null|
|4 |null|
|5 |null|
+--+----+


To populate the val column with non-null values, you simply need to modify the data generation query as follows:

SQL
 
-- Delete previously generated records
delete from sample;

-- Generate 1000 records using the concat function to populate the `val` column
insert into sample(id,val)
select id, concat('val', id * 10) from generate_series(1, 1000) as id;


Finally, retrieve the first five records from the table to review the sample data:

SQL
 
select * from sample order by id limit 5;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Quick and simple. For more advanced data generation scenarios using the generate_series function, refer to this article.

Generate Series Again ... But in SQL Server

The latest versions of SQL Server have introduced support for the generate_series function. Consequently, whether you're using PostgreSQL or SQL Server, the experience of generating sample data is consistent.

To generate a series of numbers from 1 to 5, you would invoke the function as follows:

SQL
 
select value from generate_series(1,5);

+-----+
|value|
+-----+
|1    |
|2    |
|3    |
|4    |
|5    |
+-----+


Subsequently, execute the following command to create 1000 records in the sample table:

SQL
 
insert into sample(id,val)
select value,concat('val', value * 10) from generate_series(1,1000);


To check the generated data, query the first five records:

SQL
 
select * from sample order by id
offset 0 rows
fetch next 5 rows only;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


As demonstrated, the generate_series function is a powerful tool for data generation tasks. However, this function is not part of the SQL standard yet and may not be available in all database systems.

Hierarchical Queries in Oracle

Oracle is among the databases that do not support the generate_series function. However, the database community has devised many alternative methods for generating dummy data.

One popular method involves using hierarchical queries. For instance, the following hierarchical query can generate a series of records from 1 to 5:

SQL
 
select level from dual connect by level <= 5;

+-----+
|LEVEL|
+-----+
|1    |
|2    |
|3    |
|4    |
|5    |
+-----+


Internally, the query constructs a data tree structure, with the LEVEL pseudo-column indicating the depth of the tree, beginning with one for the root.

By using this hierarchical query with an insert statement, you can produce 1000 records for the sample table:

SQL
 
-- Oracle doesn't support the `text` data type,
-- requiring you to create the table this way
create table sample (id int, val varchar(10));

-- Generate 1000 records
insert into sample(id,val)
select level, concat('val', level * 10) from dual 
connect by level <= 1000;


The generated data will resemble what you've seen with PostgreSQL and SQL Server:

SQL
 
select * from sample order by id
offset 0 rows fetch next 5 rows only;

+--+-----+
|ID|VAL  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Recursive Common Table Expressions in MySQL

Like Oracle, MySQL does not support the generate_series function, which necessitates finding alternative methods for generating sample data.

One such method is using recursive common table expressions (CTEs). To illustrate, the following recursive query generates a series of numbers from 1 to 5:

SQL
 
with recursive seq as (
    select 1 as id union all select id +1 from seq where id < 5
)
select id from seq;

+--+
|id|
+--+
|1 |
|2 |
|3 |
|4 |
|5 |
+--+


Next, you can use this recursion together with the following insert statement to produce 1000 records:

SQL
 
insert into sample(id,val)
with recursive seq as (
    select 1 as id union all select id +1 from seq where id < 1000
)
select id, concat('val', id * 10) from seq;


Finally, to ensure the mock data has been properly generated, take a quick look at the first five records:

SQL
 
select * from sample order by id limit 5;

+--+-----+
|id|val  |
+--+-----+
|1 |val10|
|2 |val20|
|3 |val30|
|4 |val40|
|5 |val50|
+--+-----+


Keep Mastering Databases 

As you can see, relational databases are much more than just storage for your application data. They provide a broad set of capabilities, allowing you to perform various tasks right on the database end. One such task is sample data generation, which can sometimes be satisfied with built-in database features.

Stay tuned for more database-related content, and keep mastering your database skills.

In the meantime, you can watch this video to see how the discussed data generation techniques work in practice for much larger tables.


Dummy data MySQL Relational database Synthetic data sql PostgreSQL

Opinions expressed by DZone contributors are their own.

Related

  • 5 Key Postgres Advantages Over MySQL
  • Useful System Table Queries in Relational Databases
  • Java EE 6 Pet Catalog with GlassFish and MySQL
  • Common Mistakes to Avoid When Writing SQL Code

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook