Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

How to Write a Simple Random Test Data Sentence Generator

DZone's Guide to

How to Write a Simple Random Test Data Sentence Generator

Looking for a way to generate a lot of random data in order to performance test your application? Read on to find out how!

· Performance Zone ·
Free Resource

Maintain Application Performance with real-time monitoring and instrumentation for any application. Learn More!

On the Importance of Test Data

We all know that test data is really, really important.

If you don’t have the right data, you can’t test the functionality. You can’t check that an email is sent out on the customer’s 65th Birthday unless you have a customer who has a date of birth that will trigger that functionality.

Some data is path invariant and has to be specific to control the path.

We know this.

But we don’t always randomize enough data and our test data becomes stale, etc., etc.

One of My Hobbies: Randomly Generating Test Data

Periodically, I write code to randomly generate data. It's easier than writing a full compiler and interpreter but still keeps my hand in at parsing text. You can find old notes and tools on test data on my web site.

My most recent public test data utility attempts to randomly recreate some of the cartoon ‘slogans’ from my book "Dear Evil Tester":

  • "Of course I’m not Evil… do I look Evil?
  • “Are you a good little tester? I’m better than that, I’m Eeevil!”
  • “I’m not evil, I’m just doing WHATEVER it takes”
  • etc.

You get the idea. If you want more you can read the book or try out my Sloganizer online.

My Sloganizer Is a Test Data Generator

Since I’m still learning JavaScript, I created a really simple string generator, if you read the code in the sloganizer.html then you’ll see it, but I’m going to explain the algorithm here in case you want to use it in your own test data generation work.

I have an array of strings which are sentence templates, e.g.

  • “#start I’m not #im_not”
  • “#start I’m #im_not”

Everything starting with a “#” is a ‘macro,’ everything else is a string literal.

The ‘macros’ are a hash of:

  • ‘key’ - which matches the macro name, e.g. “start” and “im_not”
  • ‘value’ - which is an array of strings, where the string might be another macro or a literal

For example: 

 "start" : ["", "Of course", "I honestly believe", "I really do think"],
 "im_not" : ["evil", "good", "nasty", "unpleasant"],

And I have a recursive function which, given a string, will:

  • work through the string.
  • if it finds a ‘macro’ name, then it randomly chooses a string from the macro array and expands it.
  • if it finds a literal then it adds it to the output string.

Pretty simple.

So “#start I’m not #im_not” might generate:

  • I’m not good.
  • Of course, I’m not nasty.
  • I really do think I’m not evil.

The code isn’t particularly forgiving when given bad data:

  • I could get in an infinite loop if a macro string references itself.
  • if a macro doesn’t have an entry in the hash then the code will throw an exception.

The code doesn’t ‘compile’ the sentences or phrases to find these problems in advance (although it could if I wrote code to do that).

But it does work, and it will generate thousands, if not millions, of random sentences.

What’s the Point?

The point is, that:

  • It doesn’t take much to create random data.
  • It doesn’t take a long time to write utility functions to generate random data.
  • Even if you can’t find a library that you like, for the language you use, you could write your own, or probably re-purpose a template engine to create data.

And, more dangerously… it's fun to write random data generation code.

Collect, analyze, and visualize performance data from mobile to mainframe with AutoPilot APM. Learn More!

Topics:
performance ,data generation ,testing

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}