Over a million developers have joined DZone.

Python: Create Fake Data with Faker

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Every once in a while, I run into a situation where I need dummy data to test my code against. If you need to do tests on a new database or table, you will run into the need for dummy data often. I recently came across an interesting package called Faker. Faker’s sole purpose is to create semi-random fake data. Faker can create fake names, addresses, browser user agents, domains, paragraphs and much more. We will spend a few moments in this article demonstrating some of Faker’s capabilities.

Getting Started

First of all, you will need to install Faker. If you have pip (and why wouldn’t you?), all you need to do is this:

pip install fake-factory

Now that you have the package installed, we can start using it!

Creating Fake Data

Creating fake data with Faker is really easy to do. Let’s look at a few examples. We will start with a couple of examples that create fake names:

from faker import Factory
def create_names(fake):
    for i in range(10):
        print fake.name()
if __name__ == "__main__":
    fake = Factory.create()

If you run the code above, you will see 10 different names printed to stdout. This is what I got when I ran it:

Mrs. Terese Walter MD
Jess Mayert
Ms. Katerina Fisher PhD
Mrs. Senora Purdy PhD
Gretchen Tromp
Winnie Goodwin
Yuridia McGlynn MD
Betty Kub
Nolen Koelpin
Adilene Jerde

You will likely receive something different. Every time I’ve run the script, the results were never the same. Most of the time, I don’t want the name to have a prefix or a suffix, so I created another script that only produces a first and last name:

from faker import Factory
def create_names2(fake):
    for i in range(10):
        name = "%s %s" % (fake.first_name(),
        print name
if __name__ == "__main__":
    fake = Factory.create()

If you run this second script, the names you see should not contain a prefix (i.e. Ms., Mr., etc) or a suffix (i.e. PhD, Jr., etc). Let’s take a look at some of the other types of fake data that we can generate with this package.

Creating Other Fake Stuff

Now we’ll spend a few moments learning about some of the other fake data that Faker can generate. The following piece of code will create six pieces of fake data. Let’s take a look:

from faker import Factory
def create_fake_stuff(fake):
    stuff = ["email", "bs", "address",
             "city", "state",
    for item in stuff:
        print "%s = %s" % (item, getattr(fake, item)())
if __name__ == "__main__":
    fake = Factory.create()

Here we use Python’s built-in getattr function to call some of Faker’s methods. When I ran this script, I received the following for output:

email = pacocha.aria@kris.com
bs = reinvent collaborative systems
address = 57188 Leuschke Mission
Lake Jaceystad, KY 46291
city = West Luvinialand
state = Oregon
paragraph = Possimus nostrum exercitationem harum eum in. Dicta aut officiis qui deserunt voluptas ullam ut. Laborum molestias voluptatem consequatur laboriosam. Omnis est cumque culpa quo illum.

Wasn’t that fun?

Wrapping Up

The Faker package has many other methods that are not covered here. You should check out their full documentation to see what else you can do with this package. With a little work, you can use this package to populate a database or a report quite easily.

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Mike Driscoll, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}