Provisioning SQL Databases With Initial Sets of Data Using Java and MockNeat
Look at how we can programmatically provision a relational database with initial sets of data by programmatically generating and executing complex SQL Inserts statements.
Join the DZone community and get the full member experience.
Join For FreeMockNeat is a "faker"-like Java library that is used to mock complex Java objects or to generate arbitrary data for a wide range of applications.
In this tutorial, I am going to talk about how we can programmatically provision a relational database with initial sets of data by programmatically generating and executing complex SQL Inserts statements.
For the demo, we are going to use MySQL and a sample SQL schema (HR App) that can be found here:
The schema contains 7 tables with different relantionships between them.
1. Setting up the Database
If you don't have MySQL installed, you can download the community edition it from here.
If you prefer to use docker as a quick alternative, you can set up and start MySQL with the following commands:
#This will pull the official mysql docker image
docker pull mysql
After the download finishes, you can start the MySQL container like this:
# This is will start a mysql instance on port 3306 (no password will be required)
docker run -p 3306:3306 -e MYSQL_ALLOW_EMPTY_PASSWORD=yes -d mysql --default-authentication-plugin=mysql_native_password
At this point, you should have MySQL up and running.
The easiest way to work and connect to MySQL is to install MySQL Worbench. By default, the IDE comes with a pre-defined connection to localhost:3306 (where 3306 is the default port on which MySQL operates).
The next step is to create the HR Schema. The DDL script I've used to create the tables is the following.
On purporse, I didn't include any constraints (like FKs) in the definition of the tables. Those constraints can be added after the actual inserts are generated and executed.
DROP SCHEMA IF EXISTS hr;
CREATE SCHEMA hr COLLATE = utf8_general_ci;
USE hr;
CREATE TABLE regions (
region_id INT (11) UNSIGNED NOT NULL,
region_name VARCHAR(25),
PRIMARY KEY (region_id)
);
CREATE TABLE countries (
country_id CHAR(2) NOT NULL,
country_name VARCHAR(40),
region_id INT (11) UNSIGNED NOT NULL,
PRIMARY KEY (country_id)
);
CREATE TABLE locations (
location_id INT (11) UNSIGNED NOT NULL AUTO_INCREMENT,
street_address VARCHAR(40),
postal_code VARCHAR(12),
city VARCHAR(30) NOT NULL,
state_province VARCHAR(25),
country_id CHAR(2) NOT NULL,
PRIMARY KEY (location_id)
);
CREATE TABLE departments (
department_id INT (11) UNSIGNED NOT NULL,
department_name VARCHAR(30) NOT NULL,
manager_id INT (11) UNSIGNED,
location_id INT (11) UNSIGNED,
PRIMARY KEY (department_id)
);
CREATE TABLE jobs (
job_id VARCHAR(10) NOT NULL,
job_title VARCHAR(35) NOT NULL,
min_salary DECIMAL(8, 0) UNSIGNED,
max_salary DECIMAL(8, 0) UNSIGNED,
PRIMARY KEY (job_id)
);
CREATE TABLE employees (
employee_id INT (11) UNSIGNED NOT NULL,
first_name VARCHAR(20),
last_name VARCHAR(25) NOT NULL,
email VARCHAR(25) NOT NULL,
phone_number VARCHAR(20),
hire_date DATE NOT NULL,
job_id VARCHAR(10) NOT NULL,
salary DECIMAL(8, 2) NOT NULL,
commission_pct DECIMAL(2, 2),
manager_id INT (11) UNSIGNED,
department_id INT (11) UNSIGNED,
PRIMARY KEY (employee_id)
);
CREATE TABLE job_history (
employee_id INT (11) UNSIGNED NOT NULL,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
job_id VARCHAR(10) NOT NULL,
department_id INT (11) UNSIGNED NOT NULL
);
ALTER TABLE job_history ADD UNIQUE INDEX (
employee_id,
start_date
);
If everything went well, you should see a confirmation message:
2. Generate the SQL Inserts for the Schema Using MockNeat
The installation page for MockNeat can be found here.
The library is published as a Maven repository in jcenter(), so you can easily include it in your gradle builds:
repositories {
// something else
jcenter()
}
dependencies {
// something else
compile 'net.andreinc.mockneat:mockneat:0.2.3'
}
Or if you prefer maven:
<repositories>
<repository>
<id>jcenter</id>
<url>https://jcenter.bintray.com/</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>net.andreinc.mockneat</groupId>
<artifactId>mockneat</artifactId>
<version>0.2.3</version>
</dependency>
</dependencies>
My recommendation is to always check the official github page or the jcenter() page to get the latest version.
We can start generating the DML Insert Statements using the sqlInserts() method.
The first step is to create a MockNeat instance (or re-using one of the predefined ones):
MockNeat m = MockNeat.threadLocal();
Afterwards we can jump into generating the the actual SQL inserts for the first table: "regions".
// ----- Regions -----
String[] regionNames = new String[] {"Europe", "Americas", "Asia", "Middle East and Africa" };
SQLTable regions = m.sqlInserts()
.tableName("regions")
.column("region_id", m.intSeq().start(1)) // A sequence 1,2,3...
// At each step we iterate for the values in the regionNames[] array
.column("region_name", m.seq(regionNames), TEXT_BACKSLASH)
.table(regionNames.length) // The number of rows
.val();
The SQLTable toString() method is overwritten to generate all the corresponding SQL inserts associated with the table. So if we print the "regions" object, we will obtain something like:
System.out.println(regions);
/**
Output:
INSERT INTO regions (region_id, region_name) VALUES (1, 'Europe');
INSERT INTO regions (region_id, region_name) VALUES (2, 'Americas');
INSERT INTO regions (region_id, region_name) VALUES (3, 'Asia');
INSERT INTO regions (region_id, region_name) VALUES (4, 'Middle East and Africa');
**/
The next step is to generate the data for the "countries" table. MockNeat already defines two dictionaries containing names and iso codes for 241 countries (DictType.COUNTRY_NAME and DictType.COUNTRY_ISO_CODE_2). What we can do is to iterate over the data contained in those dictionaries using the seq() method.
int numCountries = 241;
SQLTable countries = m.sqlInserts()
.tableName("countries")
.column("country_id", m.seq(COUNTRY_ISO_CODE_2), TEXT_BACKSLASH)
.column("country_name", m.seq(COUNTRY_NAME), TEXT_BACKSLASH)
.column("region_id", regions.fromColumn("region_id"))
.table(numCountries)
.val();
For the "region_id", we will randomly take data from the regions table using the fromColumn() method. This approach is particularly useful when we want to generate relational data and enforce Foreign Key constraints.
If we were to print the rows associated with SQLTable, we will obtain something like this:
System.out.println(countries);
/**
Output
NSERT INTO countries (country_id, country_name, region_id) VALUES ('AF', 'Afghanistan', 2);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('AL', 'Albania', 1);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('DZ', 'Algeria', 1);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('AS', 'American Samoa', 4);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('AD', 'Andorra', 3);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('AO', 'Angola', 2);
INSERT INTO countries (country_id, country_name, region_id) VALUES ('AI', 'Anguilla', 3);
....and so on
**/
Generating data for the "locations" table will involve writing our own custom MockUnits (arbitrary data generators).
We will need an arbitrary data generator for the "postal_code" and "street_address" column.
MockUnitString streetAddressGen =
m.fmt("#{num} #{noun} #{end}")
.param("num", m.ints().range(10, 2000))
.param("noun", m.words().nouns().format(CAPITALIZED))
.param("end", m.from(new String[]{"Ave", "St", "Blvd", "Rd"}));
So using the fmt() method, we will be able to generate street addresses with the following format:
System.out.println(streetAddressGen.val());
System.out.println(streetAddressGen.val());
System.out.println(streetAddressGen.val());
/**
Output:
1009 Maroons Blvd
286 Berberine Ave
1454 Eatables Ave
**/
Using a similar approach, we will write our own MockUnit to generate data for the "postal_code" column.
MockUnitString postalCodeGen =
m.fmt("#{word1} #{word2}")
.param("word1", m.strings().size(3).format(UPPER_CASE))
.param("word2", m.strings().size(3).format(UPPER_CASE));
The format of the postal codes is:
System.out.println(postalCodeGen.val());
System.out.println(postalCodeGen.val());
System.out.println(postalCodeGen.val());
/**
Output:
ENO OYW
OTQ NKZ
BCE TB8
**/
Now we can reuse our generators to actually generate data for the "locations" table:
int numLocations = 100;
SQLTable locations =
m.sqlInserts()
.tableName("locations")
.column("location_id", m.intSeq().start(1000).increment(100))
.column("street_address", streetAddressGen, TEXT_BACKSLASH)
.column("postal_code", postalCodeGen, TEXT_BACKSLASH)
.column("city", m.cities().us(), TEXT_BACKSLASH)
.column("state_province", m.cities().capitals(), TEXT_BACKSLASH)
.column("country_id", countries.fromColumn("country_id"), TEXT_BACKSLASH)
.table(numLocations)
.val();
If we were to print the "locations" object, the SQL inserts will look like:
System.out.println(locations);
/**
Output:
INSERT INTO locations (location_id, street_address, postal_code, city, state_province, country_id) VALUES (1000, '15 Olms St', '9Z3 DOA', 'Weaver', 'Ulaanbaatar', 'MS');
INSERT INTO locations (location_id, street_address, postal_code, city, state_province, country_id) VALUES (1100, '205 Dah Rd', 'M4D D7X', 'Land of Pines', 'Baghdad', 'CR');
INSERT INTO locations (location_id, street_address, postal_code, city, state_province, country_id) VALUES (1200, '848 Mirliton Blvd', 'JSA ARO', 'Millersburg', 'Prague', 'MT');
INSERT INTO locations (location_id, street_address, postal_code, city, state_province, country_id) VALUES (1300, '1585 Supplementer Rd', '97R RF7', 'Sparta', 'São Tomé', 'PY');
... and so on
**/
The next step is to start generating "departments" information. This will be quite straightforward. The only prerequisite will be to have a list of manager IDs already generated. We will use this list for other tables in the future.
// Manager ids
// Generating a list of 20 manager ids in the range [1..1000).
List<Integer> managerIds = m.ints()
.range(1, 1000)
.list(() -> new ArrayList<>(), 20)
.val();
// We will have 10 departments
int depNum = 10;
SQLTable departments = m.sqlInserts()
.tableName("departments")
.column("department_id", m.intSeq().start(0).increment(10))
.column("department_name",m.seq(DEPARTMENTS), TEXT_BACKSLASH)
.column("manager_id", m.from(managerIds))
.column("location_id", locations.fromColumn("location_id"))
.table(depNum)
.val();
After the departments are generated, we can start generating the actual "jobs":
// ----- JOBS -----
String[] jobNames = new String[] {"Analyst",
"Consultant",
"Senior Consultant",
"Manager",
"Software Architect",
"Senior Manager",
"Director"};
String[] jobIds = new String[] { "A",
"C",
"SC",
"M",
"SA",
"SM",
"D" };
int numJobs = jobNames.length;
int minSalary = 2000;
int maxSalary = 5000;
SQLTable jobs = m.sqlInserts()
.tableName("jobs")
.column("job_id", m.seq(jobIds), TEXT_BACKSLASH)
.column("job_title", m.seq(jobNames), TEXT_BACKSLASH)
.column("min_salary", minSalary + "")
.column("max_salary", maxSalary + "")
.table(numJobs)
.val();
System.out.println(jobs);
The last and the most complex table is the "employees". Here, we will need to write additional coding logic to make sure every employee has a managerId associated with it.
// ----- EMPLOYEES -----
int numEmployes = 1000;
SQLTable employees = m.sqlInserts()
.tableName("employees")
.column("employee_id", m.intSeq())
.column("first_name", m.names().first(), TEXT_BACKSLASH)
.column("last_name", m.names().last(), TEXT_BACKSLASH)
.column("email", m.emails().domain("corp.com"), TEXT_BACKSLASH)
.column("phone_number", m.regex("\\+30 [0-9]{9}"), TEXT_BACKSLASH)
.column("hire_date",
m.localDates()
.past(LocalDate.of(2000, 1, 1))
.display(BASIC_ISO_DATE), TEXT_BACKSLASH)
.column("job_id", jobs.fromColumn("job_id"), TEXT_BACKSLASH)
.column("salary", m.ints().range(minSalary, maxSalary))
.column("commission_pct", "NULL")
.column("manager_id", "NULL" /* TO UPDATE LATER */)
.column("department_id", departments.fromColumn("department_id"))
.table(numEmployes)
.val();
// Updating employees with their manager ID
// - One person has no manager ID - it should be a director level person
// - Manager id and employee_id should be different
// We update all people with a manager_id
employees.updateAll((i, insert) -> {
Integer employeeId = parseInt(insert.getValue("employee_id"));
Integer managerId = Integer.MIN_VALUE;
while(employeeId == (managerId = m.from(managerIds).val()));
insert.setValue("manager_id", managerId + "");
});
// One of the directors doesn't have a manager id as he is the CO
employees.selectFirstWhere(sqlInsert -> sqlInsert.getValue("job_id").equals("D"))
.get()
.setValue("manager_id", "NULL");
System.out.println(employees);
3. Putting It All Together
If we were to put all together, the resulting script would look like:
import net.andreinc.mockneat.MockNeat;
import net.andreinc.mockneat.abstraction.MockUnitString;
import net.andreinc.mockneat.unit.text.sql.SQLTable;
import org.junit.Test;
import java.time.LocalDate;
import java.util.ArrayList;
import java.util.List;
import static java.lang.Integer.parseInt;
import static java.time.format.DateTimeFormatter.BASIC_ISO_DATE;
import static net.andreinc.mockneat.types.enums.DictType.COUNTRY_ISO_CODE_2;
import static net.andreinc.mockneat.types.enums.DictType.COUNTRY_NAME;
import static net.andreinc.mockneat.types.enums.DictType.DEPARTMENTS;
import static net.andreinc.mockneat.types.enums.StringFormatType.CAPITALIZED;
import static net.andreinc.mockneat.types.enums.StringFormatType.UPPER_CASE;
import static net.andreinc.mockneat.unit.text.sql.escapers.MySQL.TEXT_BACKSLASH;
public class SQLInsertsTest {
public static void main(String[] args) {
MockNeat m = MockNeat.threadLocal();
// ----- Regions -----
String[] regionNames = new String[] {"Europe",
"Americas",
"Asia",
"Middle East and Africa" };
SQLTable regions = m.sqlInserts()
.tableName("regions")
.column("region_id",
m.intSeq().start(1))
.column("region_name",
m.seq(regionNames), TEXT_BACKSLASH)
.table(regionNames.length)
.val();
System.out.println(regions);
// ----- Countries -----
int numCountries = 241;
SQLTable countries = m.sqlInserts()
.tableName("countries")
.column("country_id", m.seq(COUNTRY_ISO_CODE_2), TEXT_BACKSLASH)
.column("country_name", m.seq(COUNTRY_NAME), TEXT_BACKSLASH)
.column("region_id", regions.fromColumn("region_id"))
.table(numCountries)
.val();
System.out.println(countries);
// ----- Locations -----
int numLocations = 100;
MockUnitString streetAddressGen = m.fmt("#{num} #{noun} #{end}")
.param("num",
m.ints().range(10, 2000))
.param("noun",
m.words().nouns().format(CAPITALIZED))
.param("end",
m.from(new String[]{"Ave", "St", "Blvd", "Rd"}));
MockUnitString postalCodeGen = m.fmt("#{word1} #{word2}")
.param("word1",
m.strings().size(3).format(UPPER_CASE))
.param("word2",
m.strings().size(3).format(UPPER_CASE));
SQLTable locations = m.sqlInserts()
.tableName("locations")
.column("location_id",
m.intSeq().start(1000).increment(100))
.column("street_address",
streetAddressGen, TEXT_BACKSLASH)
.column("postal_code",
postalCodeGen, TEXT_BACKSLASH)
.column("city",
m.cities().us(), TEXT_BACKSLASH)
.column("state_province",
m.cities().capitals(), TEXT_BACKSLASH)
.column("country_id",
countries.fromColumn("country_id"), TEXT_BACKSLASH)
.table(numLocations)
.val();
System.out.println(locations);
// ----- Departments -----
// Manager ids
List<Integer> managerIds = m.ints()
.range(1, 1000)
.list(() -> new ArrayList<>(), 20)
.val();
int depNum = 10;
SQLTable departments = m.sqlInserts()
.tableName("departments")
.column("department_id",
m.intSeq().start(0).increment(10))
.column("department_name",
m.seq(DEPARTMENTS), TEXT_BACKSLASH)
.column("manager_id",
m.from(managerIds))
.column("location_id",
locations.fromColumn("location_id"))
.table(depNum)
.val();
System.out.println(departments);
// ----- JOBS -----
String[] jobNames = new String[] {"Analyst",
"Consultant",
"Senior Consultant",
"Manager",
"Software Architect",
"Senior Manager",
"Director"};
String[] jobIds = new String[] { "A",
"C",
"SC",
"M",
"SA",
"SM",
"D" };
int numJobs = jobNames.length;
int minSalary = 2000;
int maxSalary = 5000;
SQLTable jobs = m.sqlInserts()
.tableName("jobs")
.column("job_id",
m.seq(jobIds), TEXT_BACKSLASH)
.column("job_title",
m.seq(jobNames), TEXT_BACKSLASH)
.column("min_salary",
minSalary + "")
.column("max_salary",
maxSalary + "")
.table(numJobs)
.val();
System.out.println(jobs);
// ----- EMPLOYEES -----
int numEmployes = 1000;
SQLTable employees = m.sqlInserts()
.tableName("employees")
.column("employee_id",
m.intSeq())
.column("first_name",
m.names().first(), TEXT_BACKSLASH)
.column("last_name",
m.names().last(), TEXT_BACKSLASH)
.column("email",
m.emails().domain("corp.com"), TEXT_BACKSLASH)
.column("phone_number",
m.regex("\\+30 [0-9]{9}"), TEXT_BACKSLASH)
.column("hire_date",
m.localDates()
.past(LocalDate.of(2000, 1, 1))
.display(BASIC_ISO_DATE), TEXT_BACKSLASH)
.column("job_id",
jobs.fromColumn("job_id"), TEXT_BACKSLASH)
.column("salary",
m.ints().range(minSalary, maxSalary))
.column("commission_pct",
"NULL")
.column("manager_id",
"NULL" /* TO UPDATE LATER */)
.column("department_id",
departments.fromColumn("department_id"))
.table(numEmployes)
.val();
// Updating employees with their manager ID
// - One person has no manager ID - it should be a director level person
// - Manager id and employee_id should be different
// We update all people with a manager_id
employees.updateAll((i, insert) -> {
Integer employeeId = parseInt(insert.getValue("employee_id"));
Integer managerId = Integer.MIN_VALUE;
while(employeeId == (managerId = m.from(managerIds).val()));
insert.setValue("manager_id", managerId + "");
});
// One of the directors doesn't have a manager id as he is the CO
employees.selectFirstWhere(sqlInsert -> sqlInsert.getValue("job_id").equals("D"))
.get()
.setValue("manager_id", "NULL");
System.out.println(employees);
}
}
Thanks for reading! Let me know what you think by writing in the comments section!
Opinions expressed by DZone contributors are their own.
Comments