Over a million developers have joined DZone.

Testing Spark Code

DZone's Guide to

Testing Spark Code

An introduction to testing your Apache Spark code with Scala.

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

There's many ways to try to test your Spark code. Depending on if it's Java (you can do basic JUnit tests to test non-Spark pieces) or ScalaTest for your Scala code. You can also do full integration tests by running Spark locally or in a small test cluster.

Another awesome choice from Holden is using Spark-Testing-Base.

There are a few presentations and articles about doing so:

Add it to your SBT for Spark 1.6.0.

"com.holdenkarau" %% "spark-testing-base" % "1.6.0_0.3.1"

parallelExecution in Test := false

Check out the Wiki for usage details.

Use RDDComparisons to see if your RDD is as expected.

Some other Testing Resources for Apache Spark:

There are many options, I suggest trying a few and definitely using Spark Testing Base and ScalaTest at a minimum. Always deploy locally first and try with a subset of data before moving to production. Develop Test Driven and in an iterative fashion just like a program you are writing.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.

spark ,scala ,testing ,jvm ,big data ,apache spark ,unit testing

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}