DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
The Latest "Software Integration: The Intersection of APIs, Microservices, and Cloud-Based Systems" Trend Report
Get the report
  1. DZone
  2. Popular
  3. Open Source
  4. How Much Testing Is Enough?

How Much Testing Is Enough?

Understanding test results with bncov and coverage analysis.

Mark Griffin user avatar by
Mark Griffin
·
May. 10, 19 · Presentation
Like (1)
Save
Tweet
Share
6.11K Views

Join the DZone community and get the full member experience.

Join For Free

A frequently asked question in software testing is: “is that enough testing, or should we do more?” Whether you’re writing unit tests for your programs or finding bugs in closed-source third-party software, knowing what code you have and have not covered is an important piece of information. In this article, we’ll introduce bncov, an open-source tool developed by ForAllSecure (available at on GitHub), and demonstrate how it can be used to answer common questions that arise in software testing.

At its core, bncov is a code coverage analysis tool. While there are several well-known tools that offer visibility into code coverage, we wanted to build a solution that enhanced and/or extended functionality in the following areas:

  1. Easily scriptable. Scriptability is a key feature to align with larger analysis efforts and for combining with other tools.
  2. Strong data presentation. Good visualizations quicken and enhance understanding.
  3. Fuzzing/testing workflow compatible. Tools that exactly fit your needs increase productivity and speed.
  4. Supports binary targets. Sometimes you don’t have the original source code.

While the existing code coverage tools are really good at some of these, our main focus was scriptability because of our requirements for flexibility. The driving purpose is to be able to answer common questions in software testing that often require a combination of information from static and dynamic analysis, so flexibility is important in order to answer a large variety of potential questions. We found that a plugin for Binary Ninja works perfectly for this because it allows users to easily leverage the information from Binary Ninja in a python scripting environment.

The workflow for using bncov is a three-step process. While the first step is up to you, we’ve made the other steps easy to pipeline:

  1. Generate test cases. Test cases can be generated by any approach, from fuzzing solutions to manual test case development.
  2. Generate coverage data from those test cases.
  3. Run analysis and display output with bncov.

After running the normal install process for Binary Ninja plugins (instructions available here), the first step is to collect coverage information. This is done by running your target program with your inputs, also known as input files or seeds, and collecting coverage in the drcov format (from DynamoRIO’s built-in drcov tool). We’ve packaged a script to make this easier, but it’s nothing a simple bash loop couldn’t accomplish. It’s important to note to the data that’s collected because this is the data that ends up in the plugin and forms the basis of our analyses. The coverage files generated by drcov include which basic blocks are executed, but not the order or the number of times the blocks are executed.

With the information from the coverage files, we can now visualize block coverage using bncov. Import the whole directory of coverage files, and you’ll see blocks colored in a heatmap fashion, with blocks painted from blue to purple to red. Redder hues indicate that a block was covered in a smaller percentage of input files (i.e. the block is “rare” among the inputs), while bluer hues show blocks with a higher percentage, indicating more common code paths. Blocks that have not been covered at all are not recolored. This color scheme allows users to instantly visualize which blocks have been tested and what the common code paths are as they review functions.

[Figure 1: The smaller the relative percentage of test cases that cover the block (“the rarer it is”), the more reddish it is.]

Coverage visualization is very helpful for manual analysis. bncov’s unique differentiator is its scripting flexibility and ability to automate analysis. The code coverage data used to provide visualization can be used within Binary Ninja’s built-in scripting console or a normal python environment (only available with a Binary Ninja commercial license), allowing for additional analyses using its existing knowledge of the binary. The ability to programmatically reason about code coverage with a set of input files is extremely powerful, and we’ve provided some built-in examples as starting points, such as the GUI commands “Highlight Rare Blocks” and “Highlight Coverage Frontier.” These examples highlight and log blocks that are only covered by a single coverage file and blocks that have an outgoing edge to an uncovered block, respectively. Users can build various interesting analyses on top of these building blocks to answer challenging questions, such as the one we started with: “Should we do more testing?”

[Figure 2: Blocks highlighted in green are in the “Coverage Frontier” — meaning they have an outgoing edge that isn’t covered.]

As a demonstration, let’s walk through an open-source project that has built-in test resources. The open-source XML library “TinyXML-2” (https://github.com/leethomason/tinyxml2) is an excellent example because it is a compact library that includes a test program, test inputs, and a Google OSS-fuzz harness. If users choose to conduct additional testing (like fuzzing), it’s helpful to understand what code the built-in test cases cover and compare how much more coverage fuzzing yields. This process is simplified by using a bncov script to compare coverage between the set of coverage files from before and after fuzzing. The code below is the heart of the coverage comparison process from the script:

# bv is Binary Ninja’s BinaryView object of the target file

# CoverageDBs are bncov’s class that represents coverage information

first_covdb = coverage.CoverageDB(bv, first_coverage_dir)

second_covdb = coverage.CoverageDB(bv, second_coverage_dir)



unique_to_first = first_covdb.total_coverage - second_covdb.total_coverage

function_mapping = first_covdb.get_functions_from_blocks(unique_to_first)

for function, blocks in function_mapping.iteritems():

    print "    %s: %s" % (function, [hex(b) for b in blocks])


[Figure 3: Comparing coverage between sets of inputs with bncov]

We’ll start the analysis with three sets of starting inputs:

  1. The test XML files included in the resources directory
  2. XML inputs extracted from TinyXML-2’s test binary.
  3. A set of XML files gathered from multiple test suites on the Internet

First, we collect coverage using the bncov’s drcov automation script on each input set to understand the baseline level of coverage we get from the different inputs. We wrote a simple program that uses TinyXML-2 to parse and print input files, which we used as our target for collecting coverage (and later for fuzzing). The results from collecting baselines show that the extracted test cases offered significantly more coverage than the test cases from the resources directory, which makes sense as the test binary includes all the tests from the resources. Also, as you might expect, the combination of multiple external test suites had the most coverage among the initial input sets.

By fuzzing our target program with each of the input sets, we will explore new code paths in TinyXML-2 by generating testcases that cover new basic blocks that the initial sets do not. The results of fuzzing will vary greatly depending on multiple factors: how long the fuzzer is run and how fast the target program is, the kind of input processing the target does, the quality of the starting input set, the capabilities of the fuzzer, etc. In our case though, we’re looking to compare coverage and look for relative increases in block coverage across the input sets, so we just fuzzed each input set for the same period of time with AFL. Once the fuzzing finished, we did some comparison using one of the scripts included with bncov.

[Figure 4: Coverage comparison script output.]

As expected, we saw increased coverage for each input set after a short fuzzing run. Although the gap in the number of blocks covered between each input set narrows after fuzzing, there are certain blocks that were only found by the external suite. This result makes sense, as certain input constructs are harder for a fuzzer like AFL to synthesize. This is where a technique known as symbolic execution, a technology within our Mayhem solution, can often help by solving for inputs that are unlikely to be discovered by random permutations from a fuzzer.

Input Set

Blocks Covered

Unique Blocks

Blocks After Fuzzing

Unique Blocks After Fuzzing

Resources Directory

647

0

703

0

Extracted Test Cases

719

3

743

1

External Test Suite

744

28

747

14

[Figure 5]

Using the script output, we can now start to answer “how much is enough testing?” Using bncov, users now have data points that show which functions have been exercised and which basic blocks are not covered by the existing test cases. With the included coverage frontier analysis, we can also see the boundary between existing test inputs and untouched code, allowing users to automatically identify functions that could benefit from further exploration. This type of analysis quickly increases the amount of understanding a user has of the target code, and this is the kind of information needed to answer “how much is enough.”

import bncov

frontier = bncov.covdb.get_frontier()

function_mapping = bncov.covdb.get_functions_from_blocks(frontier)

for function_name, blocks in function_mapping.iteritems():

print "%s has %d frontier blocks" % (function_name, len(blocks))


[Figure 6: Enumerate frontier blocks for each function.]

Coverage analysis and using coverage information to enhance fuzzing is an active and developing research area. Using bncov to reason about coverage is a step forward because it leads to analysis automation, and flexible reasoning required for targeted application of techniques to augment fuzzing, such as directed symbolic execution. We’ll share more on these advanced topics in a future installment, but in the meantime you can fork bncov on GitHub and experiment for yourself! We hope it helps you get a better understanding of your testing coverage and discover code paths you might be missing.

unit test Blocks Open source file IO Fuzzing Code coverage IT

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Chaos Engineering Tutorial: Comprehensive Guide With Best Practices
  • How To Handle Secrets in Docker
  • How We Solved an OOM Issue in TiDB with GOMEMLIMIT
  • Top 5 Data Streaming Trends for 2023

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: