[This article was written by Nir Cohen.]
A new open source Python module to help you with your performance testing, load testing, configurations, data structure, metrics...you name it.
We've all (well.. not ALL…) been in that situation where we need to generate random data for any number of reasons. For instance, something I kept running into was testing our Elasticsearch node clusters (mappings, config integrity and load) and logstash filters (inputs, filters, outputs…) . To do so I needed to actually get some logs in there...which is apparently easier said than done.
A good example would be sending logs from your Apache instances across your environment. When you want to test your Apache log filters in logstash, and your Elasticsearch/logstash cluster's susceptibility to high load you might decide to go through the whole process of installing Apache on your multiple instances and start sending Apache logs. Reconfiguring the logger, managing multiple instances to send from different IPs and creating high load, will eventually be tedious to handle.
This is where feeder comes in.
Feeder is a Python module that basically generates random data in different formats and allows you to send it via your transport of choice (for a list of available transports see the transports section.) This enables you to test load, performance, configuration, and data structure regularly. Nifty.
Cloudify 3.0 - deploy thousands of node instances without losing control. Go
From Up to Destroy
What feeder means for Cloudify, is that we're now able to test our Elasticsearch nodes and logstash agents to understand how our system behaves. We can easily create a Vagrant file which automatically loads up some instances with Feeder already running and send our randomly generated, formatted data to our Cloudify manager. Then, when we're done testing them we can just do a
vagrant destroy, and we're done.
Feeder has a CLI called "mouth", in which you can define a number of different runtime parameters.
The two main parameters are transports and formatters.
Transports are very simple Python classes that define how the data is sent, and formatters are classes which describe how the data is formatted.
In the CLI you can specify:
which transport, formatter and config file to use.
the number of messages you want to send.
the time gap between batches.
the batch size (i.e. the number of messages to send simultaneously).
To define your own format and data that you want to use, and the transport's configuration, you use a Python dict based configuration file.
Two basic formatters are supplied alongside some additional, application-specific formatters (and more are in the planning).
The Custom formatter lets you define the exact format for your data and supplies two methods for defining the data itself - use the special $RAND variable that will tell feeder to randomize the data for a specific field in your format, or define your own, static data set.
The JSON formatter requires setting the data only. A set of fields and their corresponding data objects.
For transports, configuration depends on the transport type. Using the UDP transport, for instance, will require you to define the host and the port you want to send to.
Feeder in Cloudify
With Cloudify, this will now be our primary tool to test our monitoring and logging architecture, which before required a lot more manual labor and time investment. I'm also hoping that it will be used to test our REST API load and ability to withstand problematic requests.
Feeder is not only about testing logging pipelines...in the docs you'll be able to find an excellent use case on how to use Feeder to feed metrics. Feeder can actually be used to send any type of data via any type of transport by writing very little code.
I'll just set the scene and then let you read the rest.
Let's say you're leading a monitoring project, where you need to build your first Graphite cluster,
and transport your metrics via AMQP. You have your RabbitMQ cluster installed and Graphite is ready to pull metrics, but it's important that you feed multiple metrics with multiple values and multiple namespaces to test the integrity of your cluster's configuration and performance. But you want to do all this without having to install StatsD or Diamond on all of your instances and configure them to send metrics... and then find out that your configuration is incorrect and iterate over it 11 times…
This is the sort of scenario Feeder comes to solve - complex testing scenarios - where you want to send metrics from two availability zones, three types of middleware servers, where each have three base metric types, each with average and max measurements. Now you can - read the full use case and see the code.
If you're interested in contributing formatters and transports to the project you can fork the repo, and have fun.