Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Automated Integration Testing

DZone's Guide to

Automated Integration Testing

Learn how to run integration tests automatically, preventing human error and producing repeatable results for further analysis.

· DevOps Zone
Free Resource

Learn more about how CareerBuilder was able to resolve customer issues 5x faster by using Scalyr, the fastest log management tool on the market. 

In this article, I am trying to summarize my experience with integration testing. I will cover the following topics:

  • Running integration tests automatically and avoiding human-introduced mistakes.

  • Making test results repeatable.

  • Running tests on a system under various conditions.

  • Producing exhaustive test results suitable for further analysis.

  • Design a robust test bed that recovers back to clean state.

System Under Test

First, let’s consider the subject of integration testing - the system under test. Our team deals with operation support systems (OSS) for telecommunications.

OSS solution

An OSS solution typically consists of several applications running on different machines; each application provides a subset of some domain-specific functions. For example, the Fault Management System (FMS) aggregates fault signals from several Element Management Systems (EMS), then the Expert System analyzes these signals and makes a decision on raising a trouble ticket in the Trouble Ticketing System (TTS).

The OSS industry has notoriety as a legacy software realm. Some domain-specific systems have a monolithic design that embraces as many functionalities as possible. Usually, different systems come from different vendors. As a rule, such systems require tedious and error-prone manual operations to install, connect with other applications, and finally make them work. With such solution components, integration testing is a difficult and expensive task. Automated integration testing is a challenge.

A reader familiar with today’s continuous integration (CI) practices could say: “Hey man, split your solution into atomic microservices, package these services in containers, manage them with any orchestration tool, deploy everything in the cloud instantly, independently for each test execution...“ If this is your case, then you are happy enough in your life and this article is of little use for you. This article is about solutions built upon legacy applications, about long-aged and spaghetti-designed software. I am taking up the challenge of applying CI practices to integration testing of such solutions.

Test Bed Model

For simplicity, I will consider a simple model rather than a real solution. Let’s have a model consisting of three components:

Test bed model

  1. Data Producer

  2. Messaging System

  3. Data Consumer

Each component runs on a separate host. Practices developed for this simplistic test bed are applicable to a real system comprised of many components.

The Result We Want

An essential property of any experiment is result repeatability. Talking about testing, this boils down to this statement: for the same input data and test conditions, we obtain the same results (within measurement accuracy). Unrepeatable test results mean problems with our testing approach: variance in test conditions, wrong input data. Only with accurate and repeatable results at hand can we judge the system under test. We will see later how to design a test bed producing repeatable results.

The first kind of test output is log files. We have to collect all the log files relevant to a particular test execution from all the system components. We have to store these log files immediately after the test execution is done. Of course, this category also includes core-files, crash reports, and heap dumps - in other words, any troubleshooting information generated by the system during the test execution.

The second kind of test results is performance metrics. Such metrics may include

  • CPU and memory usage statistics,

  • Disk read/write statistics,

  • Network send/receive statistics.

In addition, we definitely want to have some application-level metrics. With our simplistic test bed model, these metrics may include

  • End-to-end message latency - time for a message to pass from Producer to Consumer,

  • Delivery latency - how long a message stays in Messaging System awaiting for consumption,

  • Send rate for Producer,

  • Receive rate for Consumer,

  • Send queue size statistics for Producer,

  • Processing queue size statistics for Consumer,

  • Message size statistics.

We may want to have some of these metrics totally accumulated and some of them averaged over a sliding window.

As you can see, we need to equip the test bed with monitoring tools in order to get the most out of it. These monitoring tools run along with each test. Monitoring tools should have a negligible footprint. Monitoring should not affect test results.

Reset Button for the Test Bed

Reset button

The test bed must have one mandatory feature: a Reset button, something that brings the test bed to an initial, clean and correct state - from any state. This feature is vital for obtaining repeatable test results. Whatever incorrect situation occurred during tests, before next text execution we just push Reset. Hence, each test execution starts on a clean test bed, unaffected by previous tests.

The most straightforward way to implement the Reset button feature is to reinstall all the components of the system. For some applications, however, this is not easy to do. Some software does not provide an uninstall feature at all. Some software, though, has an uninstall feature, but forgets to clean up some files; these stale files corrupt the re-installed application. As a rule, complex monolithic applications require sophisticated manual tweaks during installation. Even the order of installing and starting sometimes does matter; for example: a consumer falls into a blocked state immediately after start if there is no available Messaging System.

The bitter truth is that we have to make it. We have to develop some magic machine that bypasses all pitfalls and resurrects the system from the ashes. In order to do this, we need to learn ins and outs of every system component.

You can use any automation tool for the implementation. It could be an automated configuration management tool like Ansible or a scripting language like Python. Our team uses Groovy scripts for automated installation. The tool is unimportant, the outcome does matter - the test bed is in the initial state.

How can we implement the Reset button for our test bed model? Let me express it in Groovy-like pseudocode:

// Stop
atHost 'host1', { stopProducer }
atHost 'host3', { stopConsumer }
atHost 'host2', { stopMessagingSystem }
// Uninstall
atHost 'host1', { uninstallProducer }
atHost 'host3', { uninstallConsumer }
atHost 'host2', { uninstallMessagingSystem }
// Install
atHost 'host2', { installMessagingSystem }
atHost 'host3', { installConsumer }
atHost 'host1', { installProducer }
// Start
atHost 'host2', { startMessagingSystem }
atHost 'host3', { startConsumer }
atHost 'host1', { startProducer }

Note that this approach implements one of essential Continuous Integration and Continuous Delivery principles - configuration as code. The installation scripts codify all configuration adjustments, all tricks and workarounds. There is no secret manual tuning by the only guru of the team. Everyone can see all the steps in the scripts and track them in the source control system.

Running the Tests

Running a single test is not a trivial task when the system under test spans many machines. To imagine the complexity, let’s consider a test scenario for our test bed model:

  1. [host1] Configure Producer to send 10 messages of 1 KB size every second.

  2. [host2] Configure Messaging System to use a message buffer of 1 MB size.

  3. [host3] Configure Consumer to receive messages with 1 MB buffer.

  4. [host3] Configure Consumer to perform message enrichment.

  5. [host1] Send messages during 1 hour.

  6. [all hosts] Collect test results.

  7. [all hosts] Cleanup.

This oversimplified scenario gives an idea of complexity with integration testing. A real solution composed of dozens of domain-specific applications requires hundreds of steps - set a parameter, start a process, wait for an event, get some data, and so on. These steps are associated with dozens of machines. Moreover, some steps require strict execution order.

Machine army

The idea of automation naturally comes from the confession, that humans are not good in such complex, tedious and repetitive activities. A human inevitably makes mistakes - for example, misses some configuration step or logins to a wrong host. We need a machine general that relentlessly commands the parade of well-disciplined machine soldiers.

Our team employs Ansible - a simple yet powerful automation engine. We describe our test bed in terms of Ansible inventory. We code test scenarios as Ansible playbooks. Let’s see how it could be for the scenario described in the beginning of this section.

Test bed description - Ansible inventory file, test_bed_inventory:

[producer]
host1

[messaging_system]
host2

[consumer]
host3

Test scenario - Ansible playbook. It is a YAML file, test-scenario.yml:

---

- hosts: producer
  roles:
  - configure_producer

- hosts: messaging_system
  roles:
  - configure_messagingsystem

- hosts: consumer
  roles:
  - configure_consumer

- hosts: producer
  roles:
  - send_messages

- hosts: all
  roles:
  - collect_results

- hosts: all
  roles:
  - cleanup

Run the scenario with parameters:

ansible-playbook -i test_bed_inventory \
  -e producer_message_rate=10 \
  -e producer_message_size_kb=1 \
  -e message_buffer_size_kb=1024 \
  -e consumer_enrichment=true \
  -e test_interval_min=60 \
  test-scenario.yml

We specify testing parameters as command line arguments. Ansible passes them to roles that comprise the playbook. This test parameter extraction is important; we will see later how it helps us in testing under various conditions.

Besides automated execution across many hosts, this approach has the following advantages:

  • Test bed description is separated from test scenarios. We can easily switch to another test bed simply by supplying another inventory file.

  • Configuration as code: test bed description, test scenarios, test parameters, application settings - everything is stored in plain text files under source control.

  • Encapsulation of elementary steps into Ansible roles provides building blocks for future test scenarios - write once and then reuse.

  • Good starting point for further automation - see later.

Testing Under Various Conditions

In practice, we want to validate the system within some range of operation conditions. Typically, we would like to pin three points within the continuous range: low, moderate, and high. Applying this to our test bed model, we can prepare a matrix of test parameters.

Parameter matrixFor simplicity, I vary only two parameters: producer message rate and message size. For a real system, the set of parameters defining test conditions is substantially larger. In practice, we have a multidimensional configuration matrix.

Well, it’s a good time to employ Jenkins. This powerful CI system provides Matrix Project Plugin that actually implements the configuration matrix. Let’s create a matrix job TestScenario1 that will run our test scenario.

Configuration matrix

In order to compose this matrix, I created two user-defined axes: producer_message_rate and message_size. Each axis declares a set of three parameters. After the job starts, Jenkins will execute the test scenario for each cell in the matrix.

The only missing thing is how to connect the Jenkins configuration matrix to the Ansible playbook implementing the test scenario. Jenkins' Ansible Plugin is exactly what we need. With this plugin, we invoke our scenario playbook, passing parameters from the configuration matrix as parameters of the playbook.

Jenkins is much more than just a test scenarios launcher. Additional benefits of using Jenkins are

  • Visibility of the testing process. Everyone can see how tests are running, what is the current status, what are testing conditions and so on.

  • Traceability for tools used in tests. For example, it is possible to configure the test execution job to grab some specific version of a simulator or a monitoring tool. It is possible to track artifacts with MD5 fingerprints.

  • A repository of test results. Jenkins has the capability to archive artifacts after the job is done. Just set up archiving for the test results - log files, metric files - and you will have access to results of each test execution for each test configuration. Moreover, Jenkins provides a REST API to grab artifacts; this feature is very useful for automated results processing.

  • Graphical interface - yes, it is nicer to press buttons rather than enter shell commands!

Often, we have only one test bed. In other words, only one test execution can run at any point of time. In such a case, it is necessary to ensure the sequential (rather than parallel) execution of each configuration matrix element. Matrix job has such option.

When testing a real solution, it turns out that each test scenario runs for a long time. The configuration matrix, on the other hand, may have dozens of cells in multiple dimensions. In some cases, full matrix execution may require several days. Often we do want to run just a subset of configurations rather than the entire set. Matrix Reloaded Plugin gives the capability to rerun only certain elements of the configuration matrix.

Handling Exceptional Cases

Now, when we are about riding out to the wild, it’s time to hold our horses and think once again. We have forgotten to consider exceptional cases.

FlowchartLet’s have a closer look on the playbook that implements our simplistic test scenario. Suppose that Producer exits with an error and this step in the playbook fails:

- hosts: producer
  roles:
  - send_messages

In this case, the next step that collects test results will not start. The results might have log files or core files containing important troubleshooting information, but we will not see them. Just think of it: our test bed delivers results only for successful tests. I would argue that such test bed is useless.

Another problem is that in the case of test failure, the cleanup step also never starts. Hence, next test execution will start on the corrupted test bed. The entire test campaign is spoiled. Human intervention is necessary to stop this avalanche of spoiled tests and bring the test bed back to a clean state. We definitely don’t want such kind of automation.

In our team, we handle error cases by using Jenkins Pipelines. Jenkins pipeline describes the workflow in Pipeline domain-specific language (DSL). When started, a pipeline build polls the source control system for the Jenkinsfile with the pipeline definition. Pipeline DSL is actually an extension of Groovy language; it provides syntax constructions to use the majority of Jenkins plugins and features.

Here is a sketch of the pipeline for our dummy test scenario:

try {
    // Run test scenario
    ansiblePlaybook extras: "… test parameters …",
                                    inventory: "test_bed_inventory",
                                    playbook: "test-scenario.yml"
} finally {
    try {
        // Either success or failure - collect test results
        ansiblePlaybook extras: "… test parameters …",
                                        inventory: "test_bed_inventory",
                                        playbook: "collect-results.yml"
    } finally {
        // In any case, cleanup the test bed
        ansiblePlaybook extras: "… test parameters …",
                                        inventory: "test_bed_inventory",
                                        playbook: "test-bed-cleanup.yml"
    }
}

Note that now we have three separate Ansible playbooks:

  • To run the test scenario - test-scenario.yml

  • To collect test results - collect-results.yml

  • To clean up the test bed - test-bed-cleanup.yml

The test-scenario.yml playbook is not limited to test steps. Most probably, this playbook also contains

  • Test bed sanity checks,

  • Deployment tasks for various testing tools like simulators and monitors,

  • Preparation tasks like configuration adjustments depending on testing conditions.

The collect-results.yml playbook contains tasks to collect all interesting test artifacts from all machines of the test bed.

The test-bed-cleanup.yml playbook makes the vital job - it brings the test bed to the initial state. This playbook is responsible for repeatability and accuracy of test results. For some systems, the cleanup playbook may just shortcut to Reset button. For other systems, a full reset might be too long of an operation and the cleanup playbook might have some lightweight implementation. By any means, it must do the job: ensure the test bed is clean.

Overcoming Difficulties

Square wheels

At present, not all functionality that we need is available in Jenkins plugins that we use. The most discouraging thing is that one cannot have a matrix pipeline Jenkins job. If I have a set of 3x3 test configurations, then I want to run each of 9 configurations independently, with correct exception handling. Unfortunately, a pipeline job cannot be a matrix job.

We overcome this problem by using parametrized jobs. For example, declare two parameters:

  • producerMessageRates = “10 100 1000”

  • messageSizes = “10 100 1000”

In the Jenkinsfile, iterate over all values for both parameters:

for (rate in producerMessageRates) {
    for (size in messageSizes) {
        …
    }
}

This is far from elegant, but it works. Just note that the number of nested loops is equal to the number of varying parameters.

Following this way, we come to another problem: if some test configuration fails, subsequent test configurations will not even start. Of course, we can wrap each configuration into try-catch Groovy construction, so the next configuration will go even if the previous one failed. However, in this case, test failures are not visible - the entire job reports “success” status despite some tests failed. Testing is to make failures visible, lack of visibility is a major deficiency.

Finally, we found a workaround: use parallel Jenkinsfile syntax construction, but limit the number of Jenkins slave nodes to exactly two. With more than two nodes, Jenkins will try to start more than one test configuration in parallel; however, we cannot allow this because we have only one test bed. With one node, this machinery simply does not work. So many hacks just to emulate the Matrix plugin!

A cherry on this cake: you get a classic deadlock if you start two pipeline jobs with two slave nodes. Mind your steps.

Jenkinsfile syntax although derives from Groovy, but still lacks all Groovy capabilities. One of the major disadvantages, in my opinion, is that one cannot use Closures in Jenkinsfiles. That’s why Jenkinsfiles sometimes look cumbersome and scary.

Despite all the described disadvantages, the Jenkins ecosystem is very rich and powerful. One can usually find a way to overcome difficulties.

What's Next?

To conclude this article, I would like to think a bit about future improvements. I am talking about improvements in our testing system rather than in the systems that we test.

In addition to hosts with solution components, our test bed has some hosts with testing tools. These auxiliary hosts include Jenkins slaves, machines with simulators and monitoring tools. There is no necessity to have these machines always up and running. We could spin up them immediately before launching a test set and recycle them as soon as the test set is done. Jenkins ecosystem provides ways to create virtual machines on demand in OpenStack. This is a way to minimize hardware usage.

I do not cover one very important aspect of testing: processing test results. Right now, we analyze test result files manually with some minimal automation. However, our test bed produces hundreds of log and metric files. Manual analysis becomes more and more difficult. Moreover, manual analysis is error prone - humans make mistakes, that’s inevitable. It could be a sad story - to have an ideal test bed but to give wrong conclusions just because of a mistake during result analysis.

This means that we need to involve some techniques of automated data analysis. However, that is the subject of another article.

Find out more about how Scalyr built a proprietary database that does not use text indexing for their log management tool.

Topics:
integration testing ,devops ,ansible ,jenkins ,automated testing

Published at DZone with permission of Arseniy Tashoyan. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}