DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • DataWeave: Play With Dates (Part 1)
  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • The Long Road to Java Virtual Threads
  • Exploring Exciting New Features in Java 17 With Examples

Trending

  • Comparing SaaS vs. PaaS for Kafka and Flink Data Streaming
  • Apache Doris vs Elasticsearch: An In-Depth Comparative Analysis
  • Power BI Embedded Analytics — Part 2: Power BI Embedded Overview
  • *You* Can Shape Trend Reports: Join DZone's Software Supply Chain Security Research
  1. DZone
  2. Data Engineering
  3. Data
  4. Datafaker: An Alternative to Using Production Data

Datafaker: An Alternative to Using Production Data

As developers or testers, we frequently have the need to test our systems. But getting access to realistic or useful data isn't always easy.

By 
Erik Pragt user avatar
Erik Pragt
·
May. 22, 22 · Tutorial
Likes (12)
Comment
Save
Tweet
Share
15.7K Views

Join the DZone community and get the full member experience.

Join For Free

As developers or testers, we frequently have the need to test our systems. In this process, be it unit testing, integration testing, or any other form of testing, the data is often the leading and deciding factor. But getting access to good data isn't always easy. Sometimes the data is quite sensitive, like medical or financial data. At other times, there's not enough data (for example, when attempting a load test), or sometimes the data you're looking for is hard to find. For cases like the above, there's a solution, called Datafaker.

Datafaker is a library for the JVM suitable to generate production-like fake data. This data can be generated as part of your unit tests or can be generated in the form of external files, such as CSV or JSON files, so it can serve as the input to other systems. This article will show you what Datafaker is, what it can do, and how you can use it in an effective way to improve your testing strategy.

What Is Datafaker?

Datafaker is a library written in Java and can be used by popular JVM languages such as Java, Kotlin, or Groovy. It started as a fork of the no longer maintained Javafaker, but it has seen many improvements since its inception. Datafaker consists of a core to handle the generation of data, and on top of that has a wide variety of domain-specific data providers. Such providers can be very useful, for example, to generate real-looking addresses, names, phone numbers, credit cards, and other data, or are sometimes a bit more on the light side, such as when generating the characters of the TV show Friends or the IT Crowd. No matter your use case, there's a high chance that Datafaker can provide data to your application. And, when there's a provider of data available, Datafaker provides the option for a pluggable system to create your own providers!

How to Use Datafaker

Datafaker is published to Maven Central on a regular basis, so the easiest way to get started with Datafaker is to use a dependency management tool like Maven or Gradle. To get started with Datafaker using Maven, you can include the dependency as follows:

XML
 
<dependency>
    <groupId>net.datafaker</groupId>
    <artifactId>datafaker</artifactId>
    <version>1.4.0</version>
</dependency>


Above, we're using version 1.4.0, the latest version at the time of writing this article. To make sure you're using the latest version, please check Maven Central.

Once the library has been included in your project, the easiest way to generate data is as follows:

Java
 
import net.datafaker.Faker;

Faker faker = new Faker();
System.out.println(faker.name().fullName()); // Printed Vicky Nolan


If you need more information, there's an excellent getting started with Datafaker guide in the Datafaker documentation.

A few things are going on which are maybe not immediately visible. For one, whenever you run the above code, it will print a random full name, consisting of the first name and last name. This name will be different every time. In our example above, it's using the default locale (English), and a random seed, which means a random name will be generated every time you run the above code. But if we want something a bit more predictable, and use perhaps a different language, we can:

Java
 
long seed = 1;
Faker faker = new Faker(new Locale("nl"), new Random(seed));
System.out.println(faker.name().fullName());


In the above example, we generate a random Dutch full name, but since we're using a fixed seed now, we know that no matter how often we run our code, the program will produce the same random values on every run. This helps a great deal if we want our test data to be slightly more repeatable, for example when we're doing a regression test.

While the above example shows how to generate names, it's possible to generate a very wide range of random data. Examples of these are addresses, phone numbers, credit cards, colors, codes, etc. A full list of these can be found in the documentation (https://www.datafaker.net/documentation/providers/). Besides these, Datafaker provides also more technical options such as random enums and lists, to make it easier to generate your random test data.

Custom Collections

In case you need to generate a larger set of test data, Datafaker provides several options to do so. One of these options is to use Fake Collections. Fake collections allow the creation of large sets of data in memory by providing a set of data suppliers to the collection method. This is best demonstrated using an example:

Java
 
List<String> names = faker.<String>collection()
    .suppliers(
        () -> faker.name().firstName(),
        () -> faker.name().lastName())
    .minLen(5)
    .maxLen(10)
    .build().get();


The above will create a collection of Strings with at least 5 elements, but with a maximum of 10 elements. Each element will either be a first name or a last name. It's possible to create many variations of the above, and similar examples are possible even when the data types are different:

Java
 
List<Object> data = faker.collection()
    .suppliers(
        () -> faker.date().future(10, TimeUnit.DAYS),
        () -> faker.medical().hospitalName(),
        () -> faker.number().numberBetween(10, 50))
    .minLen(5)
    .maxLen(10)
    .build().get();

System.out.println(data);


This will generate a list of Objects, since the `future`, `hospitalName` and `numberBetween` generators all have different return types.

Custom Providers

While Datafaker provides a lot of generators out of the box, it's possible that generators are missing, or that some of the generators work slightly different than your use-case needs. To support cases like this, it's possible to create your own data provider, either by providing a YML configuration file or by hardcoding the possible values in your code.

To create a provider of data, there are two steps involved: creating the data provider and registering the data provider in your custom Faker. An example can be found below, in which we'll create a specific provider for generating turtle names:

Java
 
class Turtle {
    private static final String[] TURTLE_NAMES = new String[]{"Leonardo", "Raphael", "Donatello", "Michelangelo"};
    private final Faker faker;

    public Turtle(Faker faker) {
        this.faker = faker;
    }

    public String name() {
        return TURTLE_NAMES[faker.random().nextInt(TURTLE_NAMES.length)];
    }
}


Since all methods to access providers in the Faker class are static, we need to create our own custom Faker class, which will extend the original Faker class so we can use all existing data providers, plus our own:

Java
 
class MyCustomFaker extends Faker {
    public Turtle turtle() {
        return getProvider(Turtle.class, () -> new Turtle(this));
    }
}


Using the custom faker is similar to what we've seen before:

Java
 
MyCustomFaker faker = new MyCustomFaker();
System.out.println(faker.turtle().name());


If you want to know more about creating your own provider, or using YML files to provide the data, the Datafaker custom provider documentation provides more information on this subject.

Exporting Data

Sometimes, you want to do more than generate the data in memory, and you might need to provide some data to an external program. A commonly used approach for this would be to provide the data in CSV files. Datafaker provides such a feature out of the box, and besides generating CSV files, it also has the option to generate JSON, YML, or XML files without the need for external libraries. Creating such data is similar to creating collections of data, which we've seen above.

Generation of files could be done in several ways. For instance, sometimes it is required to generate a document with random data. For that purpose, to generate a CSV file with random data, use the `toCsv` method of the `Format` class. An example can be found below:

Java
 
System.out.println(
    Format.toCsv(
            Csv.Column.of("first_name", () -> faker.name().firstName()),
            Csv.Column.of("last_name", () -> faker.name().lastName()),
            Csv.Column.of("address", () -> faker.address().streetAddress()))
        .header(true)
        .separator(",")
        .limit(5).build().get());


In the example above, 5 rows of data are generated, and each row consists of a first name, last name, and street address. It's possible to customize the generation of the CSV, for example by including or excluding the header, or by using a different separator char. More information on different options and examples of how to generate XML, YML, or JSON files can be found in the Datafaker fileformats documentation.

Exporting Data With Some Constraints

There is another way of CSV generation. So-called conditional generation when there are some constraints between data. Imagine we want to generate a document containing a person's name and his/her interests and a sample of interests. For the sake of simplicity, we are going to consider 2 fields of interest: Music and Food. For "Music" we want to see a sample of the music genre, for “Food” we want to see a sample of a dish e.g.

Plain Text
 
"name";"field";"sample"
"Le Ferry";"Music";"Funk"
"Mrs. Florentino Schuster";"Food";"Scotch Eggs"


To do that we need to generate a collection of such data.

First let's create rules for generating the objects, for instance:

Java
 
class Data {
    private Faker faker = new Faker();
    
    private String name;
    private String field;
    private String interestSample;

    public Data() {        
        name = faker.name().name();
        field = faker.options().option("Music", "Food");
        switch (field) {
            case "Music": interestSample = faker.music().genre(); break;
            case "Food": interestSample = faker.food().dish(); break;
        }
    }

    public String getName() {
        return name;
    }

    public String getField() {
        return field;
    }

    public String getInterestSample() {
        return interestSample;
    }
}


Now we can use the Data class to generate CSV data like demonstrated below:

Java
 
String csv = Format.toCsv(
        new Faker().<Data>collection()
            .suppliers(Data::new)
            .maxLen(10)
            .build())
    .headers(() -> "name", () -> "field", () -> "sample")
    .columns(Data::getName, Data::getField, Data::getInterestSample)
    .separator(";")
    .header(true)
    .build().get();


This will generate a CSV string with headers and columns containing random data, but with constraints between the columns we specified.

Conclusion

This article gave an overview of some of the options provided by Datafaker and how Datafaker can help in addressing your testing needs.

For suggestions, bugs, or other feedback, head over to the Datafaker project site and feel free to leave some feedback.

Test data CSV Production (computer science) Strings Data Types

Opinions expressed by DZone contributors are their own.

Related

  • DataWeave: Play With Dates (Part 1)
  • Tired of Messy Code? Master the Art of Writing Clean Codebases
  • The Long Road to Java Virtual Threads
  • Exploring Exciting New Features in Java 17 With Examples

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!