FakeIt Series (Part 4 of 5): Working With Existing Data
Working with existing data is a powerful FakeIt feature. It maintains the integrity of randomly generated documents and transforms and imports existing data.
Join the DZone community and get the full member experience.
Join For FreeSo far in our FakeIt series, we’ve seen how we can generate fake data, share data and dependencies, and use definitions for smaller models. Today, we are going to look at the last major feature of FakeIt, which is working with existing data through inputs.
Rarely as developers do we get the advantage of working on greenfield applications. Our domains are more often than not a comprised of different legacy databases and applications. As we are modeling and building new applications, we need to reference and use this existing data. FakeIt allows you to provide existing data to your models through JSON, CSV, or CSON files. This data is exposed as inputs
an variable in each of a model's *run
and *build
functions.
Users Model
We will start with our users.yaml
model that we updated to in our most recent post to use Address
and Phone
definitions.
name: Users
type: object
key: _id
data:
min: 1000
max: 2000
properties:
_id:
type: string
description: The document id built by the prefix "user_" and the users id
data:
post_build: "`user_${this.user_id}`"
doc_type:
type: string
description: The document type
data:
value: "user"
user_id:
type: integer
description: An auto-incrementing number
data:
build: document_index
first_name:
type: string
description: The users first name
data:
build: faker.name.firstName()
last_name:
type: string
description: The users last name
data:
build: faker.name.lastName()
username:
type: string
description: The username
data:
build: faker.internet.userName()
password:
type: string
description: The users password
data:
build: faker.internet.password()
email_address:
type: string
description: The users email address
data:
build: faker.internet.email()
created_on:
type: integer
description: An epoch time of when the user was created
data:
build: new Date(faker.date.past()).getTime()
addresses:
type: object
description: An object containing the home and work addresses for the user
properties:
home:
description: The users home address
schema:
$ref: '#/definitions/Address'
work:
description: The users work address
schema:
$ref: '#/definitions/Address'
main_phone:
description: The users main phone number
schema:
$ref: '#/definitions/Phone'
data:
post_build: |
delete this.main_phone.type
return this.main_phone
additional_phones:
type: array
description: The users additional phone numbers
items:
$ref: '#/definitions/Phone'
data:
min: 1
max: 4
definitions:
Phone:
type: object
properties:
type:
type: string
description: The phone type
data:
build: faker.random.arrayElement([ 'Home', 'Work', 'Mobile', 'Other' ])
phone_number:
type: string
description: The phone number
data:
build: faker.phone.phoneNumber().replace(/[^0-9]+/g, '')
extension:
type: string
description: The phone extension
data:
build: chance.bool({ likelihood: 30 }) ? chance.integer({ min: 1000, max: 9999 }) : null
Address:
type: object
properties:
address_1:
type: string
description: The address 1
data:
build: `${faker.address.streetAddress()} ${faker.address.streetSuffix()}`
address_2:
type: string
description: The address 2
data:
build: chance.bool({ likelihood: 35 }) ? faker.address.secondaryAddress() : null
locality:
type: string
description: The city / locality
data:
build: faker.address.city()
region:
type: string
description: The region / state / province
data:
build: faker.address.stateAbbr()
postal_code:
type: string
description: The zip code / postal code
data:
build: faker.address.zipCode()
country:
type: string
description: The country code
data:
build: faker.address.countryCode()
Currently, our Address
definition is generating a random country. What if our e-commerce site only supports a small subset of the 195 countries? Let’s say we support six countries to start with: US, CA, MX, UK, ES, and DE. We could update the definitions country property to grab a random array element:
For brevity, the other properties have been left off of the model definition.
...
country:
type: string
description: The country code
data:
build: faker.random.arrayElement(['US', 'CA', 'MX', 'UK', 'ES', 'DE']);
While this would work, what if we have other models that rely on this same country info? We would have to duplicate this logic. We can achieve this same thing by creating a countries.json
file and adding an inputs
property to the data
property that can be an absolute or relative path to our input. When our model is generated, our countries.json
file will be exposed to each of the models build functions via the inputs argument as inputs.countries
.
For brevity, the other properties have been left off of the model definition.
name: Users
type: object
key: _id
data:
min: 1000
max: 2000
inputs: ./countries.json
properties:
...
definitions:
...
country:
type: string
description: The country code
data:
build: faker.random.arrayElement(inputs.countries);
countries.json
[
"US",
"CA",
"MX",
"UK",
"ES",
"DE"
]
By changing one existing line and adding another line in the model, we have provided existing data to our Users
model. We can still generate a random country based on the countries our application supports. Let's test our changes by using the following command:
fakeit console --count 1 models/users.yaml
Products Model
Our e-commerce application is using a separate system for categorization. We need to expose that data to our randomly generated products so that we are using valid category information. We will start with the products.yaml
that we defined in the FakeIt Series (Part 2 of 5): Shared Data and Dependencies post.
products.yaml
name: Products
type: object
key: _id
data:
min: 4000
max: 5000
properties:
_id:
type: string
description: The document id
data:
post_build: `product_${this.product_id}`
doc_type:
type: string
description: The document type
data:
value: product
product_id:
type: string
description: Unique identifier representing a specific product
data:
build: faker.random.uuid()
price:
type: double
description: The product price
data:
build: chance.floating({ min: 0, max: 150, fixed: 2 })
sale_price:
type: double
description: The product price
data:
post_build: |
let sale_price = 0;
if (chance.bool({ likelihood: 30 })) {
sale_price = chance.floating({ min: 0, max: this.price * chance.floating({ min: 0, max: 0.99, fixed: 2 }), fixed: 2 });
}
return sale_price;
display_name:
type: string
description: Display name of product.
data:
build: faker.commerce.productName()
short_description:
type: string
description: Description of product.
data:
build: faker.lorem.paragraphs(1)
long_description:
type: string
description: Description of product.
data:
build: faker.lorem.paragraphs(5)
keywords:
type: array
description: An array of keywords
items:
type: string
data:
min: 0
max: 10
build: faker.random.word()
availability:
type: string
description: The availability status of the product
data:
build: |
let availability = 'In-Stock';
if (chance.bool({ likelihood: 40 })) {
availability = faker.random.arrayElement([ 'Preorder', 'Out of Stock', 'Discontinued' ]);
}
return availability;
availability_date:
type: integer
description: An epoch time of when the product is available
data:
build: faker.date.recent()
post_build: new Date(this.availability_date).getTime()
product_slug:
type: string
description: The URL friendly version of the product name
data:
post_build: faker.helpers.slugify(this.display_name).toLowerCase()
category:
type: string
description: Category for the Product
data:
build: faker.commerce.department()
category_slug:
type: string
description: The URL friendly version of the category name
data:
post_build: faker.helpers.slugify(this.category).toLowerCase()
image:
type: string
description: Image URL representing the product.
data:
build: faker.image.image()
alternate_images:
type: array
description: An array of alternate images for the product
items:
type: string
data:
min: 0
max: 4
build: faker.image.image()
Our existing categories data has been provided in CSV format.
categories.csv
"category_id","category_name","category_slug"
23,"Electronics","electronics"
1032,"Office Supplies","office-supplies"
983,"Clothing & Apparel","clothing-and-apparel"
483,"Movies, Music & Books","movies-music-and-books"
3023,"Sports & Fitness","sports-and-fitness"
4935,"Automotive","automotive"
923,"Tools","tools"
5782,"Home Furniture","home-furniture"
9783,"Health & Beauty","health-and-beauty"
2537,"Toys","toys"
10,"Video Games","video-games"
736,"Pet Supplies","pet-supplies"
Now we need to update our products.yaml
model to use this existing data.
For brevity, the other properties have been left off of the model definition.
name: Products
type: object
key: _id
data:
min: 4000
max: 5000
inputs:
- ./categories.csv
pre_build: globals.current_category = faker.random.arrayElement(inputs.categories);
properties:
...
category_id:
type: integer
description: The Category ID for the Product
data:
build: globals.current_category.category_id
category:
type: string
description: Category for the Product
data:
build: globals.current_category.category_name
category_slug:
type: string
description: The URL friendly version of the category name
data:
post_build: globals.current_category.category_slug
...
There are a few things to notice about how we’ve updated our products.yaml
model.
inputs:
is defined as an array, not a string. While we are only using a singleinput
, you can provide as manyinput
files to your model as necessary.- A
pre_build
function is defined at the root of the model. This is because we cannot grab a random array element for each of our three category properties, as the values would not match. Each time an individual document is generated for our model, thispre_build
function will run first. - Each of our category property's build functions references the global variable set by the
pre_build
function on our model.
We can test our changes by using the following command:
fakeit console --count 1 models/products.yaml
Conclusion
Being able to work with existing data is an extremely powerful feature of FakeIt. It can be used to maintain the integrity of randomly generated documents to work with existing systems and can even be used to transform existing data and import it into Couchbase Server.
Published at DZone with permission of Laura Czajkowski, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments