Testing Spark Code

DZone 's Guide to

Testing Spark Code

An introduction to testing your Apache Spark code with Scala.

· Big Data Zone ·
Free Resource

There's many ways to try to test your Spark code. Depending on if it's Java (you can do basic JUnit tests to test non-Spark pieces) or ScalaTest for your Scala code. You can also do full integration tests by running Spark locally or in a small test cluster.

Another awesome choice from Holden is using Spark-Testing-Base.

There are a few presentations and articles about doing so:

Add it to your SBT for Spark 1.6.0.

"com.holdenkarau" %% "spark-testing-base" % "1.6.0_0.3.1"

parallelExecution in Test := false

Check out the Wiki for usage details.

Use RDDComparisons to see if your RDD is as expected.

Some other Testing Resources for Apache Spark:

There are many options, I suggest trying a few and definitely using Spark Testing Base and ScalaTest at a minimum. Always deploy locally first and try with a subset of data before moving to production. Develop Test Driven and in an iterative fashion just like a program you are writing.

apache spark, big data, jvm, scala, spark, testing, unit testing

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}